Singapore

Multi-task imitation learning (MTIL) has shown significant potential in robotic manipulation by enabling agents to perform various tasks using a single policy. It simplifies the policy deployment and enhances the agent&#39;s adaptability across different scenarios. However, key challenges remain, such as maintaining action reliability (e.g., avoiding abnormal action sequences that deviate from nominal task trajectories) and generalizing to unseen tasks with a few expert demonstrations. To address these challenges, we introduce the Foresight-Augmented Manipulation (FoAM) policy, a novel MTIL policy that pioneers the use of multi-modal goal conditions as input and introduces a foresight augmentation in addition to the general action reconstruction. FoAM enables the agent to reason about its actions&#39; visual consequences (foresight) and to be guided by these more expressive representations during task execution. Extensive experiments on over 100 tasks in simulation and real-world settings demonstrate that FoAM significantly enhances MTIL policy performance, outperforming state-of-the-art baselines by up to 41% in success rate. Meanwhile, we released our simulation suites, including 10 scenarios and over 80 challenging tasks designed for manipulation policy training and evaluation. Please see the supplementary materials for details.

AAAI 2026

FoAM: Foresight-Augmented Multi-Task Imitation Policy for Robotic Manipulation

learn from demonstration

multi-task imitation learning

robotics

Multi-task imitation learning (MTIL) has shown significant potential in robotic manipulation by enabling agents to perform various tasks using a single policy. It simplifies the policy deployment and enhances the agent's adaptability across different scenarios. However, key challenges remain, such as maintaining action reliability (e.g., avoiding abnormal action sequences that deviate from nominal task trajectories) and generalizing to unseen tasks with a few expert demonstrations. To address these challenges, we introduce the Foresight-Augmented Manipulation (FoAM) policy, a novel MTIL policy that pioneers the use of multi-modal goal conditions as input and introduces a foresight augmentation in addition to the general action reconstruction. FoAM enables the agent to reason about its actions' visual consequences (foresight) and to be guided by these more expressive representations during task execution. Extensive experiments on over 100 tasks in simulation and real-world settings demonstrate that FoAM significantly enhances MTIL policy performance, outperforming state-of-the-art baselines by up to 41% in success rate. Meanwhile, we released our simulation suites, including 10 scenarios and over 80 challenging tasks designed for manipulation policy training and evaluation. Please see the supplementary materials for details.

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The objective of this study is to advance the optimization of hybrid electricity markets using multi-agent reinforcement learning (MARL). The transition from centralized systems to public–private models introduces significant challenges, including the emergence of independent market players and the increasing integration of renewable energy sources (RESs). These challenges are further intensified by rapidly shifting demand patterns, driven both by energy-intensive data centers and AI inference workloads, as well as by political and societal instabilities.

To address these complexities, we develop a formal model of market participants’ behavior and propose a MARL-based framework for optimizing system operator strategies. This framework incorporates dynamic pricing and dispatch scheduling to minimize operational costs, maintain grid stability, and align market incentives. We also present a new, adaptable simulation environment compatible with state-of-the-art MARL methods. Empirical evaluations in increasingly complex scenarios demonstrate the effectiveness of our approach in capturing the dynamic and decentralized nature of modern electricity markets.

Multi-Agent Reinforcement Learning for Modeling, Simulating, and Optimizing Energy Markets

Episodic tasks in Reinforcement Learning (RL) often pose challenges due to sparse reward signals and high-dimensional state spaces, which hinder efficient learning. Additionally, these tasks often feature hidden “trap states”—irreversible failures that prevent task completion but do not provide explicit negative rewards to guide agents away from repeated errors. To address these issues, we propose Time-Weighted Contrastive Reward Learning (TW-CRL), an Inverse Reinforcement Learning (IRL) framework that leverages both successful and failed demonstrations. By incorporating temporal information, TW-CRL learns a dense reward function that identifies critical states associated with success or failure. This approach not only enables agents to avoid trap states but also encourages meaningful exploration beyond simple imitation of expert trajectories. Empirical evaluations on navigation tasks and robotic manipulation benchmarks demonstrate that TW-CRL surpasses state-of-the-art methods, achieving improved efficiency and robustness.

TW-CRL: Time-Weighted Contrastive Reward Learning for Efficient Inverse Reinforcement Learning

With the rapid development of generative AI, image steganography has garnered widespread attention due to its unique concealment. Recent studies have demonstrated the practical advantages of Fixed Neural Network Steganography (FNNS), notably its ability to achieve stable information embedding and extraction without any additional network training. However, the stego images generated by FNNS still exhibit noticeable distortion and limited robustness. These drawbacks compromise the security of the embedded information and restrict the practical applicability of the method. To address these limitations, we propose Robust Fixed Neural Network Steganography (RFNNS). Specifically, a texture-aware localization technique selectively embeds perturbations carrying secret information into regions of complex textures, effectively preserving visual quality. Additionally, a robust steganographic perturbation generation (RSPG) strategy is designed to enhance the decoding accuracy, even under common and unknown attacks. These robust perturbations are combined with AI-generated cover images to produce stego images. Experimental results demonstrate that RFNNS significantly improves robustness compared to state-of-the-art FNNS methods, achieving an average increase in SSIM of 23\% for recovered secret images under common attacks. Furthermore, the LPIPS value of recovered secrets images against previously unknown attacks achieved by RFNNS was reduced to 39\% of the SOTA method, underscoring its practical value for covert communication.

RFNNS: Robust Fixed Neural Network Steganography with Universal Text-to-Image Models

Existing Grammatical Error Correction (GEC) systems suffer from limited reference diversity, leading to underestimated evaluation and restricted model generalization. 
To address this issue, we introduce the **Judge of Edit-Level Validity (JELV)**, an automated framework to validate correction edits from grammaticality, faithfulness, and fluency. 
Using our proposed human-annotated Pair-wise Edit-level Validity Dataset (PEVData) as benchmark, JELV offers two implementations: a multi-turn LLM-as-Judges pipeline achieving 90\% agreement with human annotators, and a distilled DeBERTa classifier with 85\% precision on valid edits. 
We then apply JELV to reclassify misjudged false positives in evaluation and derive a comprehensive evaluation metric by integrating false positive decoupling and fluency scoring, resulting in state-of-the-art correlation with human judgments.
We also apply JELV to filter LLM-generated correction candidates, expanding the BEA19's single-reference dataset containing 38,692 source sentences. Retraining top GEC systems on this expanded dataset yields measurable performance gains. JELV provides a scalable solution for enhancing reference diversity and strengthening both evaluation and model generalization.

JELV: A Judge of Edit-Level Validity for Evaluation and Automated Reference Expansion in Grammatical Error Correction

We present a method for the unattended gray-box
identification of sensor models commonly used by
localization algorithms in the field of robotics. The
objective is to determine the most likely sensor model for
a time series of unknown measurement data, given an
extendable catalog of predefined sensor models. Sensor
model definitions may require states for rigid-body
calibrations and dedicated reference frames to replicate a
measurement based on the robot’s localization state. A
health metric is introduced, which verifies the outcome of
the selection process in order to detect false positives
and facilitate reliable decision-making. In the second
stage, an initial guess for identified calibration states
is generated, and the necessity of sensor world reference
frames is evaluated. The identified sensor model with its
parameter information is then used to parameterize and
initialize a state estimation application, thus ensuring a
more accurate and robust integration of new sensor
elements. This method is helpful for inexperienced users
who want to identify the source and type of a measurement,
sensor calibrations, or sensor reference frames. It will
also be important in the field of modular multiagent
scenarios and modularized robotic platforms that are
augmented by sensor modalities during runtime. Overall,
this work aims to provide a simplified integration of
sensor modalities to downstream applications and circumvent
common pitfalls in the usage and development of
localization approaches.

Sensor Model Identification via Simultaneous Model
Selection and State Variable Determination

Estimating the geometric and volumetric properties of transparent deformable liquids is challenging due to optical complexities and dynamic surface deformations induced by container movements. Autonomous robots performing precise liquid manipulation tasks—such as dispensing, aspiration, and mixing—must handle containers in ways that inevitably induce these deformations, complicating accurate liquid state assessment. Current datasets lack comprehensive physics-informed simulation data representing realistic liquid behaviors under diverse dynamic scenarios. To bridge this gap, we introduce Phys-Liquid, a physics-informed dataset comprising 97,200 simulation images and corresponding 3D meshes, capturing liquid dynamics across multiple laboratory scenes, lighting conditions, liquid colors, and container rotations. To validate the realism and effectiveness of Phys-Liquid, we propose a four-stage reconstruction and estimation pipeline involving liquid segmentation, multi-view mask generation, 3D mesh reconstruction, and real-world scaling. Experimental results demonstrate improved accuracy and consistency in reconstructing liquid geometry and volume, outperforming existing benchmarks. The dataset and associated validation methods facilitate future advancements in transparent liquid perception tasks. The dataset and code will be released.

Phys-Liquid: A Physics-Informed Dataset for Estimating 3D Geometry and Volume of Transparent Deformable Liquids

While transformers excel in many settings, their application in the field of automated planning is limited. Prior work like PlanGPT, a state-of-the-art decoder-only transformer, struggles with extrapolation from easy to hard planning problems. This in turn stems from problem symmetries: planning tasks can be represented with arbitrary variable names that carry no meaning beyond being identifiers. This causes a combinatorial explosion of equivalent representations that pure transformers cannot efficiently learn from. We propose a novel contrastive learning objective to make transformers symmetry-aware and thereby compensate for their lack of inductive bias. Combining this with architectural improvements, we show that transformers can be efficiently trained for either plan-generation or heuristic-prediction. Our results across multiple planning domains demonstrate that our symmetry-aware training effectively and efficiently addresses the limitations of PlanGPT.

Symmetry-Aware Transformer Training for Automated Planning

Proprietary models are increasingly dominating the race for ever-larger language models. Can open-source, smaller models remain competitive across a broad range of tasks? 
In this paper, we present the Avengers---a lightweight framework that leverages the collective intelligence of these smaller models. The Avengers builds upon four lightweight operations: (i) embedding: encode queries using a text embedding model;
(ii) clustering: group queries based on their semantic similarity;
(iii) scoring: scores each model's performance within each cluster;
and (iv) voting: improve outputs via repeated sampling and voting.
At inference time, each query is embedded and assigned to its nearest cluster. The top-performing model(s) within that cluster are selected to generate the response with repeated sampling.
Remarkably, with 10 open-source models (~7B parameters each), the Avengers surpasses GPT-4o, 4.1, and 4.5 in average performance across 15 diverse datasets spanning mathematics, coding, logical reasoning, general knowledge, and affective tasks. In particular, it surpasses GPT-4.1 on mathematics tasks by 18.21% and on code tasks by 7.46%. 
Furthermore, the Avengers delivers superior out-of-distribution generalization, and remains robust across various embedding models, clustering algorithms, ensemble strategies, data efficiency, and values of its sole parameter---the number of clusters.

The Avengers: A Routing Recipe for Collective Intelligence in Language Models

Policy gradient methods, which have been extensively studied in the last decade, offer an effective and efficient framework for reinforcement learning problems. However, their performances can often be unsatisfactory, suffering from unreliable reward improvements and slow convergence, due to high variance in gradient estimations. In this paper, we propose a universal reward profiling framework that can be seamlessly integrated with any policy gradient algorithm, where we selectively update the policy based on \emph{high-confidence performance estimations}. We theoretically justify that our technique will not slow down the convergence of the baseline policy gradient methods, but with high probability, will result in stable and monotonic improvements of their performance. Empirically, on eight continuous‐control benchmarks (Box2D and MuJoCo/PyBullet), our profiling yields up to $1.5\times$ faster convergence to near‐optimal returns, up to $1.75\times$ reduction in return variance on some setups. Our profiling approach offers a general, theoretically grounded path to more reliable and efficient policy learning in complex environments.

Stabilizing Policy Gradient Methods via Reward Profiling

Recent advancements in Large Language Models (LLMs) has lead to the development of agents capable of complex reasoning and interaction with external tools. In enterprise contexts, the effective use of such tools that are often enabled by application programming interfaces (APIs) is hindered by poor documentation, complex input or output schema, and large number of operations. These challenges make tool selection difficult and reduce the accuracy of payload formation upto 25%. We propose ACE, an automated tool creation and enrichment framework that transforms enterprise APIs into LLM-compatible tools. ACE (i) generates enriched tool specifications with parameter descriptions and examples to improve selection and invocation accuracy, and (ii) incorporates a dynamic shortlisting mechanism that filters relevant tools at runtime, reducing prompt complexity while maintaining scalability. We validate our framework on both proprietary and open-source APIs and demonstrate its integration with agentic frameworks. To the best of our knowledge, ACE is the first end-to-end framework that automates the creation, enrichment, and dynamic selection of enterprise API tools for LLM agents.

Downloads

Next from AAAI 2026

Multi-Agent Reinforcement Learning for Modeling, Simulating, and Optimizing Energy Markets

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES