Singapore

Real-world sequential decision making problems often require parameterized action spaces that require both, decisions regarding discrete actions and decisions about continuous action parameters governing how an action is executed. However, existing approaches exhibit severe limitations when handling such parameterized action spaces---planning algorithms require hand-crafted action models, and reinforcement learning (RL) paradigms focus on either discrete or continuous actions but not both. This paper extends the scope of RL algorithms to long-horizon, spare-reward settings with parameterized actions through autonomously learned state and action abstractions. We present algorithms for online learning and flexible refinement of such abstractions during RL. Empirical results show that learning such abstractions on-the-fly enable $TD(\lambda)$ to significantly outperform state-of-the-art RL approaches in terms of sample efficiency across diverse problem domains with long horizons, continuous states, and parameterized actions.

AAAI 2026

Context-Sensitive Abstractions for Reinforcement Learning with Parameterized Actions

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Distilling knowledge from human demonstrations is a promising way for robots to learn and act. Existing methods, which often rely on coarsely-aligned video pairs, are typically constrained to learning global or task-level features. As a result, they tend to neglect the fine-grained frame-level dynamics required for complex manipulation and generalization to novel tasks. We posit that this limitation stems from a vicious circle of inadequate datasets and the methods they inspire. To break this cycle, we propose a paradigm shift that treats fine-grained human-robot alignment as a conditional video generation problem. To this end, we first introduce H&R, a novel third-person dataset containing 2,600 episodes of precisely synchronized human and robot motions, collected using a VR teleoperation system. We then present Human2Robot, a framework designed to leverage this data. Human2Robot employs a Video Prediction Model to learn a rich and implicit representation of robot dynamics by generating robot videos from human input, which in turn guides a decoupled action decoder. Our real-world experiments demonstrate that this approach not only achieves high performance on seen tasks but also exhibits significant one-shot generalization to novel positions, objects, instances, and even new task categories.

Human2Robot: Learning Robot Actions from Paired Human-Robot Videos

The data scaling law has significantly enhanced large multi-modal models (LMMs) performance across various downstream tasks. However, in the domain of perceptual video quality assessment (VQA), the potential of data scaling remains unprecedented due to the scarcity of labeled resources and the insufficient scale of datasets. To address this, we propose \textbf{OmniVQA}, a framework designed to efficiently build high-quality, machine-dominated synthetic multi-modal instruction databases (MIDBs) for VQA. We then scale up to create **OmniVQA-Chat-400K**, the largest dataset in the VQA field concurrently. Our focus is on the technical and aesthetic quality dimensions, with abundant in-context instruction data to provide fine-grained VQA knowledge. Additionally, we build the **OmniVQA-MOS-20K** dataset to enhance the model's quantitative quality rating capabilities. We then introduce a **complementary training strategy** that effectively leverages the knowledge from datasets for different tasks. Furthermore, we propose the **OmniVQA-FG (fine-grain)-Benchmark** to evaluate the fine-grained performance of models. Our results demonstrate that our models achieve state-of-the-art performance.

Scaling-up Perceptual Video Quality Assessment

Multi-agent epistemic planning (MEP) is the task of generating action sequences that achieve goals specified over both the physical world and agents’ mental states. It plays an important role in research domains such as game theory, computational economics, and cognitive science. While dynamic epistemic logic (DEL) provides an expressive framework for MEP, it requires complete, model-based specifications of the initial state and action effects, and suffers from undecidability due to the unbounded nesting of beliefs.
In this work, we propose a modal variant of the situation calculus that captures much of the expressive power of the DEL approach. Inspired by the cognitive concept Theory of Mind (ToM), we introduce action theories with hierarchical structures, allowing agents to reason about other agents' action theories up to bounded depths. We develop a regression method that reduces reasoning about future states to reasoning about the initial state. By preserving bounded-order ToM throughout the regression process, our approach ensures the decidability of the planning problem. Finally, we propose an algorithm to find the optimal solution, namely, to find the shortest action sequence that achieves the goal.

Decidable Multi-agent Epistemic Planning: A Situation Calculus Approach

Chain‑of‑thought (CoT) prompting boosts Large Language Models accuracy on multi‑step tasks, yet whether the generated ``thoughts'' reflect the true internal reasoning process is unresolved. We present the first feature‑level causal study of CoT faithfulness. Combining sparse autoencoders with activation patching, we extract monosemantic features from Pythia‑70M and Pythia‑2.8B while they tackle GSM8K math problems under CoT and plain (noCoT) prompting. Swapping a small set of CoT‑reasoning features into a noCoT run raises answer log‑probabilities significantly in the 2.8B model, but has no reliable effect in 70M, revealing a clear contrast for these two scales. CoT also leads to significantly higher activation sparsity and feature interpretability scores in the larger model, signalling more modular internal computation. For example, the model's confidence in generating correct answers improves from 1.2 to 4.3. We introduce patch‑curves and random‑feature patching baselines, showing that useful CoT information is not only present in the top-K patches but widely distributed. Overall, our results indicate that CoT can induce more interpretable internal structures in high-capacity LLMs, validating its role as a structured prompting method.

How Does Chain of Thought Think? Mechanistic Interpretability of Chain-of-Thought Reasoning with Sparse Autoencoding

Reconstructing controllable Gaussian splats for articulated objects from monocular video is especially challenging due to its inherently insufficient constraints. Existing methods address this by relying on dense masks and manually defined control signals, limiting their real-world applications. In this paper, we propose an annotation-free method, **FreeGaussian**, which mathematically disentangles camera egomotion and articulated movements via flow derivatives. By establishing a connection between 2D flows and 3D Gaussian dynamic flow, our method enables optimization and continuity of dynamic Gaussian motions from flow priors without any control signals. Furthermore, we introduce a 3D spherical vector controlling scheme, which represents the state as a 3D Gaussian trajectory, thereby eliminating the need for complex 1D control signal calculations and simplifying controllable Gaussian modeling. Extensive experiments on articulated objects demonstrate the state-of-the-art visual performance and precise, part-aware controllability of our method.

FreeGaussian: Annotation-free Control of Articulated Objects via 3D Gaussian Splats with Flow Derivatives

Data-driven discovery of governing equations from data remains a fundamental challenge in nonlinear dynamics. Although sparse regression techniques have advanced system identification, they struggle with rational functions and noise sensitivity in complex mechanical systems. The Lagrangian formalism offers a promising alternative, as it typically avoids rational expressions and provides a more concise representation of system dynamics. However, existing Lagrangian identification methods are significantly affected by measurement noise and limited data availability. This paper presents a novel differentiable sparse identification framework that addresses these limitations through three key contributions: (1) the first integration of cubic B-Spline approximation into Lagrangian system identification, enabling accurate representation of complex nonlinearities, (2) a robust equation discovery mechanism that effectively utilizes measurements while incorporating known physical constraints, (3) a recursive derivative computation scheme based on B-spline basis functions, effectively constraining higher-order derivatives and reducing noise sensitivity on second-order dynamical systems. The proposed method demonstrates superior performance and enables more accurate and reliable extraction of physical laws from noisy data, particularly in complex mechanical systems compared to baseline methods.

Differentiable Sparse Identification of Lagrangian Dynamics

Road networks are critical infrastructures underpinning intelligent transportation systems and their related applications. Effective representation learning of road networks remains challenging due to the complex interplay between spatial structures and frequency characteristics in traffic patterns. Existing graph neural networks for modeling road networks predominantly fall into two paradigms: spatial-based methods that capture local topology but tend to over-smooth representations, and spectral-based methods that analyze global frequency components but often overlook localized variations. This spatial-spectral misalignment limits their modeling capacity for road networks exhibiting both coarse global trends and fine-grained local fluctuations. To bridge this gap, we propose HiFiNet, a novel hierarchical frequency-decomposition graph neural network that unifies spatial and spectral modeling. HiFiNet constructs a multi-level hierarchy of virtual nodes to enable localized frequency analysis, and employs a decomposition–updating–reconstruction framework with a topology-aware graph transformer to separately model and fuse low- and high-frequency signals. Theoretically justified and empirically validated on multiple real-world datasets across four downstream tasks, HiFiNet demonstrates superior performance and generalization ability in capturing effective road network representations.

Hierarchical Frequency-Decomposition Graph Neural Networks for Road Network Representation Learning

Precipitation nowcasting, a critical task for weather-sensitive applications, is highly challenging owing to the chaotic nature of atmospheric dynamics. Despite recent progress, existing deep learning methods are limited in their capacity to model turbulent motions, one of the key drivers of precipitation evolution. Thus, we propose MoCast, a novel physics-guided neural network that explicitly incorporates fluid dynamics knowledge to model and utilize turbulent motions for precipitation nowcasting. Specifically, inspired by the continuity equation for precipitation evolution, MoCast introduces two core innovations: (1) a physics-guided motion modeling module that learns turbulent motions from physically interpretable mean and fluctuating components based on Reynolds, Helmholtz, and Wavelet decomposition techniques, and (2) a motion-guided source-sink modeling module that learns source-sink features considering the multi-scale impact from turbulent motions based on a mixture-of-experts architecture. Extensive experiments on three real-world datasets demonstrate that MoCast achieves the state-of-the-art performance. MoCast and its diffusion-based variant MoCast+ reduce CSI error by an average of 4.9\% and 4.5\% compared to the best deterministic and probabilistic baselines, respectively, which highlights the significance of turbulence modeling for advancing meteorological AI.

MoCast: Learning Turbulent Motions Under Physical Guidance for Precipitation Nowcasting

Reproducibility is a cornerstone of scientific validation and of the authority it confers on its results. Reproducibility in machine learning evaluations leads to greater trust, confidence, and value. However, the ground truth responses used in machine learning often necessarily come from humans, among whom disagreement is prevalent, and surprisingly little research has studied the impact of effectively ignoring disagreement in these responses, as is typically the case. One reason for the lack of research is that budgets for collecting human-annotated evaluation data are limited, and obtaining more samples from multiple annotators for each example greatly increases the per-item annotation costs. We investigate the trade-off between the number of items ($N$) and the number of responses per item ($K$) needed for reliable machine learning evaluation. We analyze a diverse collection of categorical datasets for which multiple annotations per item exist, and simulated distributions fit to these datasets, to determine the optimal $(N, K)$ configuration, given a fixed budget ($N \times K$), for collecting evaluation data and reliably comparing the performance of machine learning models. Our findings show, first, that accounting for human disagreement may come with $N \times K$ at no more than 1000 (and often much lower) for every dataset tested on at least one metric. Moreover, this minimal $N \times K$ almost always occurred for $K > 10$. Furthermore, the nature of the tradeoff between $K$ and $N$---or if one even existed---depends on the evaluation metric, with metrics that are more sensitive to the full distribution of responses performing better at higher levels of $K$. Our methods can be used to help ML practitioners get more effective test data by finding the optimal metrics and number of items and annotations per item to collect to get the most reliability for their budget.

Forest vs Tree: The (N, K) Trade-off in Reproducible ML Evaluation

ATL and Strategy Logic (SL) are important languages for representation and reasoning about strategic abilities of coalitions in multi-agent systems. In analyzing strategies of agents in multi-agent systems, an important concept to consider is rationality. Strategy Logic can express rationality concepts such as Nash Equilibrium (NE). Recently, there has been work on logics for joint abilities incorporating rationality concepts based on iterated elimination of dominated strategies (IEDS). Each of NE and IEDS has its strengths and limitations. However, when the payoff is binary, e.g., whether a goal is satisfied, IEDS has more distinguishing power than NE. In this work, we propose Strategy Logic with IEDS ($SL_{IEDS}$), an extension of Strategy Logic with an IEDS operator, where we can reason about rational strategies that survive IEDS. We prove that $SL_{IEDS}$ is strictly more expressive than SL. Finally, we prove that model checking memoryless $SL_{IEDS}$ is EXPTIME-complete.

Downloads

Next from AAAI 2026

Human2Robot: Learning Robot Actions from Paired Human-Robot Videos

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES