Singapore

Navigating real-world urban environments using natural language instructions introduces unique challenges, such as ambiguous spatial references, diverse landmark types, and dynamic street scenes. Existing approaches often rely on synthetic environments or simplified goal formats, failing to generalize to city-scale, language-driven navigation. To address these limitations, we present UrbanNav, a large-scale framework for training embodied agents to follow free-form language commands in complex urban settings. We leverage web-scale human navigation videos and introduce a multimodal supervision pipeline that aligns visual trajectories with automatically extracted language instructions grounded in real-world landmarks. UrbanNav comprises over 1,500 hours of city navigation data and 3 million grounded instruction-landmark pairs, covering diverse urban contexts. Experiments demonstrate that agents trained with UrbanNav exhibit improved spatial reasoning, robustness to ambiguous commands, and generalization to unseen real-world urban layouts. Our work highlights the importance of large-scale, language-grounded supervision for enabling practical deployment of language-guided robots in real-world cities.

AAAI 2026

UrbanNav: Learning Language-Guided Embodied Urban Navigation from Web-Scale Human Trajectories

rob:motion and path planning

rob:embodied ai

cv: language and vision

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Diffusion models have demonstrated remarkable success in various visual generation tasks, including image, video, and 3D content generation. Preference optimization (PO) is a prominent and growing area of research that aims to align these models with human preferences. While existing PO methods primarily concentrate on producing favorable outputs, they often overlook the significance of classifier-free guidance (CFG) in mitigating undesirable results. Diffusion-NPO addresses this gap by introducing negative preference optimization (NPO), training models to generate outputs opposite to human preferences and thereby steering them away from unfavorable outcomes. However, prior NPO approaches, including Diffusion-NPO, rely on costly and fragile procedures for obtaining explicit preference annotations (e.g., manual pairwise labeling or reward model training), limiting their practicality in domains where such data are scarce or difficult to acquire.
In this work, we introduce \textbf{Self-NPO}, a \textbf{N}egative \textbf{P}reference \textbf{O}ptimization approach that learns exclusively from the model it\textbf{self}, thereby eliminating the need for manual data labeling or reward model training. Moreover, our method is highly efficient and does not require exhaustive data sampling. We demonstrate that Self-NPO integrates seamlessly into widely used diffusion models, including SD1.5, SDXL, and CogVideoX, as well as models already optimized for human preferences, consistently enhancing both their generation quality and alignment with human preferences.

Self-NPO: Data-Free Diffusion Model Enhancement via Truncated Diffusion Fine-Tuning

Machine unlearning aims to eliminate the influence of specific data from trained models to ensure privacy compliance. However, most existing methods assume full access to the original training dataset, which is often impractical. We address a more realistic yet challenging setting: few-shot zero-glance, where only a small subset of the retained data is available and the forget set is entirely inaccessible. We introduce GFOES, a novel framework comprising a Generative Feedback Network (GFN) and a two-phase fine-tuning procedure. GFN synthesises Optimal Erasure Samples (OES), which induce high loss on target classes, enabling the model to forget class-specific knowledge without access to the original forget data, while preserving performance on retained classes. The two-phase fine-tuning procedure enables aggressive forgetting in the first phase, followed by utility restoration in the second. Experiments on three image classification datasets demonstrate that GFOES achieves effective forgetting at both logit and representation levels, while maintaining strong performance using only 5\% of the original data. Our framework offers a practical and scalable solution for privacy-preserving machine learning under data-constrained conditions. The source code and technical appendix are provided in the Supplementary Material.

Synthetic Forgetting Without Access: A Few-Shot Zero-Glance Framework for Machine Unlearning

While hyperspectral images (HSI) benefit from numerous spectral channels that provide rich information for classification, the increased dimensionality and sensor variability make them more sensitive to distributional discrepancies across domains, which in turn can affect classification performance. To tackle this issue, hyperspectral single-source domain generalization (SDG) typically employs data augmentation to simulate potential domain shifts and enhance model robustness under the condition of single-source domain training data availability. However, blind augmentation may produce samples misaligned with real-world scenarios, while excessive emphasis on realism can suppress diversity, highlighting a tradeoff between realism and diversity that limits generalization to target domains. To address this challenge, we propose a spectral property-driven data augmentation (SPDDA) that explicitly accounts for the inherent properties of HSI, namely the device-dependent variation in the number of spectral channels and the mixing of adjacent channels. Specifically, SPDDA employs a spectral diversity module that resamples data from the source domain along the spectral dimension to generate samples with varying spectral channels, and constructs a channel-wise adaptive spectral mixer by modeling inter-channel similarity, thereby avoiding fixed augmentation patterns. To further enhance the realism of the augmented samples, we propose a spatial-spectral co-optimization mechanism, which jointly optimizes a spatial fidelity constraint and a spectral continuity self-constraint. Moreover, the weight of the spectral self-constraint is adaptively adjusted based on the spatial counterpart, thus preventing over-smoothing in the spectral dimension and preserving spatial structure. Extensive experiments conducted on three remote sensing benchmarks demonstrate that SPDDA outperforms state-of-the-art methods. Our code will be made publicly available upon publication.

Spectral Property-Driven Data Augmentation for Hyperspectral Single-Source Domain Generalization

Graph pooling has gained significant progress in recent years as an effective solution for graph-level property classification tasks. With the emergence of research on Heterogeneous Information Networks (HINs), this paper argues that graph-level datasets for graph classification should be treated as HINs rather than homogeneous graphs to enhance information aggregation. We propose HINPool, a novel and general graph pooling framework for graph-level property classification with HINs. First, we devise a systematic HIN construction procedure from the original data to capture complex interactions. Next, we introduce a type-aware heterogeneous graph pooling method featuring a Type-Aware Selector (TAS) to select essential nodes and a Readout Aggregator (RA) to fuse critical information into a graph-level representation. Finally, a cross-layer fusion function is applied to combine the output embeddings from each graph pooling layer, creating a final graph representation for downstream classification tasks. 
Our approach achieves near state-of-the-art performance on widely used graph classification benchmark datasets, demonstrating significant improvements in four out of five datasets. This work redefines the strategy for graph-level property classification with HGNNs and heterogeneous graph pooling to model intricate relationships, enhancing performance without requiring extensive domain-specific knowledge.

HINPool: A Unified Heterogeneous Graph Pooling Framework for Accurate Molecular and Protein Property Prediction

As an essential component of fine-tuning, warm-up plays a crucial role in promoting stability and generalization. Many studies have examined its underlying mechanisms from different aspects. However, most of the studies focus on incorporating these insights into optimizers to reduce the reliance on warm-up. Little attention has been paid to addressing the inherent limitations of the warm-up itself, which restricts its effectiveness. In this work, we revisit warm-up from a loss landscape perspective and identify several limitations with existing warm-up, including: (1) susceptibility to nearby suboptimal traps, (2) sensitivity to hyperparameters and random seeds, and (3) inefficiency during the early stages of training.
To overcome these limitations, we propose **S**ensitivity-**A**ware **W**arm-**U**p (SAWU), a lightweight and adaptive strategy that dynamically leverages learning sensitivity during warm-up to guide updates toward better and more stable basins. In addition, SAWU also introduces an adaptive scheduling mechanism and phase transition strategy across warm-up, stable, and decay phases to further enhance robustness and efficiency.
Extensive experiments on various downstream tasks show that SAWU significantly outperforms the vanilla method (e.g., average 3.43\% improvement on RoBerta). Moreover, SAWU can be easily combined with various optimizers and remains effective even when warm-up-based methods fail (e.g, it lifts RAdam from 49.46\% to 91.78\% on *qnli*. Thanks to its lightweight nature, SAWU introduces minimal overhead and even reduces training time by over 5\% compared to other methods.

A Better Start: Sensitivity-Aware Warm-Up for Robust and Efficient Fine-Tuning

As large language models (LLMs) become increasingly capable, concerns over the unauthorized use of copyrighted and licensed content in their training data have grown, especially in the context of code. Open-source code, often protected by open source licenses (e.g, GPL), poses legal and ethical challenges when used in pretraining. Detecting whether specific code samples were included in LLM training data is thus critical for transparency, accountability, and copyright compliance.

We propose \textsc{SynPrune}, a syntax-pruned membership inference attack method tailored for code. Unlike prior MIA approaches that treat code as plain text, \textsc{SynPrune} leverages the structured and rule-governed nature of programming languages. Specifically, it identifies and excludes consequent tokens that are syntactically required and not reflective of authorship, from attribution when computing membership scores. Experimental results show that \textsc{SynPrune} consistently outperforms the state-of-the-arts. Our method is also robust across varying function lengths and syntax categories.

Uncovering Pretraining Code in LLMs: A Syntax-Aware Attribution Approach

Learning representations on graphs is foundational for many downstream tasks, and its synergy with diffusion models has emerged as a promising direction. However, diffusion-based methods for heterogeneous graphs remain underexplored, confronting two principal challenges: (1) The presence of noise and structural heterogeneity in graphs makes it challenging to accurately capture semantic transitions among diverse relation types. (2) The isotropic Gaussian noise used in forward diffusion fails to reflect graphs' inherent semantics and structural anisotropy. To address these, we propose ARDiff, a novel framework that integrates residual diffusion with anisotropic noise for heterogeneous graph learning. Specifically, we propose a semantic residual diffusion mechanism that progressively refines node embeddings by orchestrating transitions from low-semantic (high-noise) to high-semantic (low-noise) relational contexts, thus enabling step-wise distillation of task-relevant information. In addition, to address the limitations of conventional diffusion, we introduce an anisotropic diffusion strategy: in the forward process, noise injection is oriented by structural and semantic priors; in the denoising step, a conditional diffusion mechanism is guided by a random walk encoding, enhancing both topological consistency and semantic alignment. Extensive evaluation on heterogeneous graph datasets demonstrates that ARDiff significantly surpasses current leading methods in link prediction and node classification, setting a new paradigm and benchmark in heterogeneous graph representation learning.

ARDiff: Anisotropic Residual Diffusion for Heterogeneous Graph Learning

Large Language Models (LLMs) have demonstrated strong performance across a wide range of tasks, yet they still struggle with complex mathematical reasoning, a challenge fundamentally rooted in deep structural dependencies. To address this challenge, we propose \textbf{CA}usal \textbf{MA}thematician (\textbf{CAMA}), a two-stage causal framework that equips LLMs with explicit, reusable mathematical structure. In the learning stage, CAMA first constructs the \textbf{M}athematical \textbf{C}ausal \textbf{G}raph (\textbf{MCG}), a high-level representation of solution strategies, by combining LLM priors with causal discovery algorithms applied to a corpus of question-solution pairs. The resulting MCG encodes essential knowledge points and their causal dependencies. To better align the graph with downstream reasoning tasks, CAMA further refines the MCG through iterative feedback derived from a selected subset of the question-solution pairs. In the reasoning stage, given a new question, CAMA dynamically extracts a task-relevant subgraph from the MCG, conditioned on both the question content and the LLM’s intermediate reasoning trace. This subgraph, which encodes the most pertinent knowledge points and their causal dependencies, is then injected back into the LLM to guide its reasoning process. Empirical results on real-world datasets show that CAMA significantly improves LLM performance on challenging mathematical problems. Furthermore, our experiments demonstrate that structured guidance consistently outperforms unstructured alternatives, and that incorporating asymmetric causal relationships yields greater improvements than using symmetric associations alone.

CAMA: Enhancing Mathematical Reasoning in Large Language Models with Causal Knowledge

Incentives for early arrival (I4EA) was recently proposed for studying online cooperative games. In an online cooperative game, players arrive in an unknown order, and the value increase after each player arrived should be distributed immediately among all the arrived players. Although there is only one arriving order in the game, we also hope that the value distribution is equal to their Shapley value in expectation. To achieve these goals, the early solutions ignored the fairness in each single arriving order. More specifically, an important player may receive nothing in a game, which seems unfair in reality. To combat this, we propose refined fairness in this paper and design new solutions in 0-1 value games. Specifically, we compute the distance of the distribution in each order to the Shapley value and aim to minimize it. We propose a new mechanism called Egalitarian Value-Sharing (EVS) to do so. We also show that the mechanism can maximize the egalitarian welfare among all the players who made contributions.

Fair Incentives for Early Arrival in 0-1 Cooperative Games

Origin-Destination (OD) flow matrices are essential for urban mobility analysis, underpinning applications in traffic forecasting, infrastructure planning, and policy design. However, existing methods suffer from two critical limitations: (1) reliance on auxiliary features (e.g., Points of Interest, socioeconomic statistics) that are costly to collect and have limited spatial coverage; and (2) sensitivity to spatial topology, where minor index reordering of urban regions (e.g., census tract relabeling) disrupts structural coherence in generated flows. To address these challenges, we propose Sat2Flow, a latent structure-aware diffusion-based framework that generates structurally coherent OD flows using solely satellite imagery as input. Our approach introduces a multi-kernel encoder to capture diverse regional interactions and employs a permutation-aware diffusion process that aligns latent representations across different regional orderings. Through a joint contrastive training objective that bridges satellite-derived features with OD patterns, combined with equivariant diffusion training that enforces structural consistency, Sat2Flow ensures topological robustness under arbitrary regional reindexing. Experimental results on real-world urban datasets demonstrate that Sat2Flow outperforms both physics-based and data-driven baselines in numerical accuracy while preserving empirical distributions and spatial structures under index permutations.
Sat2Flow offers a globally scalable solution for OD flow generation in data-scarce urban environments, eliminating region-specific auxiliary data dependencies while maintaining structural invariance for robust mobility modeling. For reproducibility, we will release the code upon acceptance.

Downloads

Next from AAAI 2026

Self-NPO: Data-Free Diffusion Model Enhancement via Truncated Diffusion Fine-Tuning

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Self-NPO: Data-Free Diffusion Model Enhancement via Truncated Diffusion Fine-Tuning

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads