Singapore

In environments with sparse or delayed rewards,
reinforcement learning (RL) incurs high sample complexity
due to the large number of interactions needed for
learning. This limitation has motivated the use of large
language models (LLMs) for subgoal discovery and trajectory
guidance. While LLMs can support exploration, frequent
reliance on LLM calls raises concerns about scalability and
reliability. We address these challenges by constructing a
memory graph that encodes subgoals and trajectories from
both LLM guidance and the agent’s own successful rollouts.
From this graph, we derive a utility function that
evaluates how closely the agent’s trajectories align with
prior successful strategies. This utility shapes the
advantage function, providing the critic with additional
guidance without altering the reward. Our method relies
primarily on offline input and only occasional online
queries, avoiding dependence on continuous LLM supervision.
Preliminary experiments in benchmark environments show
improved sample efficiency and faster early learning
compared to baseline RL methods, with final returns
comparable to methods that require frequent LLM interaction.

AAAI 2026

Memory Based Advantage Shaping for LLM-Guided Reinforcement Learning (Student Abstract)

memory-based learning

llms

sample efficiency

reinforcement learning

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Although centralized training with centralized execution
(CTCE) excels at multi-agent coordination, its reliance on
global information limits its use in the real world.
Conversely, the practical decentralized execution (CTDE)
paradigm often struggles with complex coordination. This
paper bridges this critical gap by introducing the
Centralized-to-Decentralized (CtoD) learning concept: a
novel framework for transferring the knowledge of a
powerful centralized policy into a robust, practical
decentralized policy. Our method, CtoD-MAT, realizes this
transition through a curriculum that gradually shifts
agents from centralized to decentralized control. A key
innovation is our dynamic scheduling mechanism, featuring a
mediator module, which ensures a robust and effective
knowledge transfer. Using challenging SMAC benchmarks, we
demonstrate that CtoD-MAT successfully produces competitive
decentralized policies, notably solving complex
coordination tasks that are difficult for standard CTDE
methods.

CtoD-MAT: Bridging Centralized and Decentralized Execution in Multi-Agent Reinforcement Learning (Student Abstract)

Automated mosquito species identification is critical for combating vector-borne diseases. We introduce Q-MoFusion, a novel hybrid quantum-classical framework that fuses deep features from pre-trained Audio Spectrogram Transformer (AST) and Whisper models using a Variational Quantum Circuit (VQC). Our approach significantly outperforms individual backbones and prior state-of-the-art benchmarks, demonstrating superior accuracy and robustness, particularly on imbalanced classes. Q-MoFusion demonstrates the potential of hybrid quantum computing to enhance bioacoustic surveillance for addressing critical public health challenges.

Q-MoFusion: A Quantum Classifier for Masquito Species Classification (Student Abstract)

Traditional recommenders often fail to disentangle the
motivations behind user choices. To address this, we
propose MV-LLMRec, a framework that models interactions
through three views: Structural, Intent, and Conformity.
MV-LLMRec leverages LLMs to generate rich semantic
representations for intent and conformity, which are
refined through graph propagation and dynamically fused via
an attention mechanism. We evaluate MV-LLMRec on the
Amazon-Movie and Amazon-Book datasets and show that it
significantly outperforms state-of-the-art baselines,
validating our approach.

MV-LLMRec: Multi-View Representation Learning with Large Language Models for Recommendation (Student Abstract)

Machine Learning (ML) models have significant potential
across research and industry to enable data-driven insights
and decision-making. Their performance relies on input data
quality, but real-world datasets often contain
imperfections, making data preprocessing essential yet
time-consuming. Our research proposes a proof-of-concept
model using Generative Artificial Intelligence (GenAI) to
analyze and transform data for supervised ML
classification. The results from the GenAI models will be
compared with traditionally preprocessed data to evaluate
effectiveness. Preliminary results indicate that
incorporating GenAI models into the preprocessing pipeline
show potential in improving ML's classification performance.

Generative AI-Driven Data Transformation for Enhanced Machine Learning Performance (Student Abstract)

Automated cancer segmentation in Whole Slide Images (WSIs) has been dominated by a paradigm of static pattern recognition, where even advanced methods leveraging Transformers, Multiple Instance Learning, or topology-aware losses remain fundamentally descriptive and correlational. To address this limitation, we reframe WSI segmentation from a descriptive task to one of causal process modeling. We introduce Topo-GraT, a novel framework featuring a Causal Growth Field (CGF) to model tumor invasion dynamics and a Causal Flow Attention (CFA) mechanism that embeds this field as an architectural prior. This causal engine is integrated within an iterative graph refinement loop that uses segmentation uncertainty to dynamically focus computational resources on the most ambiguous tissue regions. Our comprehensive experiments on multiple WSI datasets demonstrate that Topo-GraT establishes a new state-of-the-art, significantly outperforming existing methods and reducing the 95% Hausdorff Distance, a key boundary metric, by over 15%. Crucially, our framework yields the CGF as a rich, interpretable output whose structure correlates with tumor aggressiveness, positioning it as a novel biomarker for downstream prognostic tasks. By shifting the paradigm from static recognition to causal reasoning, Topo-GraT offers a more robust, efficient, and clinically insightful approach, setting a new direction for the causally-aware medical image analysis.

Topo-GraT: Learning to Grow with Causal Graph Transformers (Student Abstract)

The management and annotation of complex, multi-modal scientific data remains a major obstacle for AI-driven research due to poor reusability and scalability of current solutions. We propose SciDataMAS, a novel LLM-powered multi-agent system (MAS), which automate scientific data management through a structured data lake with provenance-based organization and an adaptive metadata taxonomy. The system uses specialized workflows for automated dataset creation, data insertion and retrieval. Experiments show the system's proficiency, with modern LLMs like GPT-5 successfully generating rich metadata schemas and filling them with high accuracy. This work provides a foundational step towards fully automated, reusable, and scalable scientific data organization which may lead to generation and accumulation by scientific community well annotated AI-ready datasets.

SciDataMAS: LLM-Driven MAS for Scientific Data Management (Student Abstract)

Offline Meta-Reinforcement Learning (OMRL) leverages pre-collected data to adapt to new tasks. Context-based methods learn task representations from contexts. However, the context is influenced by both the task and the behavior policy. The mismatch between the behavior policy and the testing policy causes a context distribution shift problem, which results in poor task representations and degraded performance. This problem is exacerbated in settings with data limitations. To address this, we propose a novel approach called Meta-Normalizing Flow (Meta-NF). First, it employs a highly expressive and sample-efficient normalizing flow policy. Second, it incorporates a metric for testing-time task representation selection to effectively mitigate the context shift problem. Empirical results demonstrate that Meta-NF outperforms existing OMRL methods, with both components contributing to its strong performance.

Meta-Normalizing Flow for Data-Limited Offline Meta-Reinforcement Learning (Student Abstract)

3D Gaussian splatting (3DGS) has recently demonstrated
significant potential in computer vision, enabling
high-fidelity 3D scene reconstruction with real-time
rendering and fast training times. However, existing
methods struggle in large, visually sparse, geometric
self-similarity environments due to heavy reliance on
image-based feature matching and depth information. In this
work, we propose a novel reconstruction pipeline that
reduces the dependence on visual features by incorporating
IMU and LiDAR data to generate accurate point clouds and
robustly localize images within the scene. Global
colorization is achieved through 3D-to-2D projections of
the localized images, which are then used to supervise 3DGS
training. Our results demonstrate that the proposed
pipeline significantly enhances the quality of 3D
reconstruction for large, sparse scenarios, opening up new
opportunities for applications in remote mapping and
autonomous inspection.

3D Gaussian Splatting for Reconstructing Large Sparse Environments (Student Abstract)

We introduce a single–backbone foundation model for brain MRI that supports dynamic modality integration: it operates with arbitrary, possibly unseen, combinations of MRI sequences at pretrain and transfer. The encoder is conditioned by text-derived modality embeddings via conditional layer normalization, while a variance–covariance penalty discourages feature collapse. Unlike expert-based designs that grow with each new sequence, our approach scales without adding modality-specific branches. Pretrained self-supervised on ∼60,000 heterogeneous MRIs, the model learns modality-aware yet modality-agnostic features. We outline evaluation on segmentation and classification under missing/unseen modalities and cross-center shifts, and present early feasibility on multiple sclerosis lesion segmentation under limited data. This work moves toward robust, protocol-agnostic MRI foundation models suited to real clinical variability.

A Foundation Model for Brain MRI with Dynamic Modality Integration (Student Abstract)

In safe reinforcement learning (SRL), there exists an
inherent conflict between maximizing reward and minimizing
cost. We propose a novel approach that effectively resolve
the conflict between maximizing reward and minimizing cost
in joint optimization.When the cost exceeds the threshold,
we perform cost-reducing updates. Otherwise, we compute
policy gradients that maximize expected rewards, while
using second-order Taylor approximation to evaluate whether
these reward-maximizing gradients would violate the cost
constraint. If constraint violation is detected, we adjust
the gradient direction to maintain safety compliance;
otherwise, we execute standard reward-increasing policy
updates. This approach helps ensure that reward-seeking
updates do not inadvertently increase costs, thereby
reducing the likelihood of constraint violations. Empirical
tests show our framework successfully manages reward-cost
trade-offs through reward augmentation and cost shaping,
improving both performance and safety without switching
optimization strategies. Results demonstrate that
concurrent treatment of both objectives in one policy
gradient update is viable for improving safe reinforcement
learning methods.

Downloads

Next from AAAI 2026

CtoD-MAT: Bridging Centralized and Decentralized Execution in Multi-Agent Reinforcement Learning (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

CtoD-MAT: Bridging Centralized and Decentralized Execution in Multi-Agent Reinforcement Learning (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads