Singapore

Offline Meta-Reinforcement Learning (OMRL) leverages pre-collected data to adapt to new tasks. Context-based methods learn task representations from contexts. However, the context is influenced by both the task and the behavior policy. The mismatch between the behavior policy and the testing policy causes a context distribution shift problem, which results in poor task representations and degraded performance. This problem is exacerbated in settings with data limitations. To address this, we propose a novel approach called Meta-Normalizing Flow (Meta-NF). First, it employs a highly expressive and sample-efficient normalizing flow policy. Second, it incorporates a metric for testing-time task representation selection to effectively mitigate the context shift problem. Empirical results demonstrate that Meta-NF outperforms existing OMRL methods, with both components contributing to its strong performance.

AAAI 2026

Meta-Normalizing Flow for Data-Limited Offline Meta-Reinforcement Learning (Student Abstract)

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

3D Gaussian splatting (3DGS) has recently demonstrated
significant potential in computer vision, enabling
high-fidelity 3D scene reconstruction with real-time
rendering and fast training times. However, existing
methods struggle in large, visually sparse, geometric
self-similarity environments due to heavy reliance on
image-based feature matching and depth information. In this
work, we propose a novel reconstruction pipeline that
reduces the dependence on visual features by incorporating
IMU and LiDAR data to generate accurate point clouds and
robustly localize images within the scene. Global
colorization is achieved through 3D-to-2D projections of
the localized images, which are then used to supervise 3DGS
training. Our results demonstrate that the proposed
pipeline significantly enhances the quality of 3D
reconstruction for large, sparse scenarios, opening up new
opportunities for applications in remote mapping and
autonomous inspection.

3D Gaussian Splatting for Reconstructing Large Sparse Environments (Student Abstract)

We introduce a single–backbone foundation model for brain MRI that supports dynamic modality integration: it operates with arbitrary, possibly unseen, combinations of MRI sequences at pretrain and transfer. The encoder is conditioned by text-derived modality embeddings via conditional layer normalization, while a variance–covariance penalty discourages feature collapse. Unlike expert-based designs that grow with each new sequence, our approach scales without adding modality-specific branches. Pretrained self-supervised on ∼60,000 heterogeneous MRIs, the model learns modality-aware yet modality-agnostic features. We outline evaluation on segmentation and classification under missing/unseen modalities and cross-center shifts, and present early feasibility on multiple sclerosis lesion segmentation under limited data. This work moves toward robust, protocol-agnostic MRI foundation models suited to real clinical variability.

A Foundation Model for Brain MRI with Dynamic Modality Integration (Student Abstract)

In safe reinforcement learning (SRL), there exists an
inherent conflict between maximizing reward and minimizing
cost. We propose a novel approach that effectively resolve
the conflict between maximizing reward and minimizing cost
in joint optimization.When the cost exceeds the threshold,
we perform cost-reducing updates. Otherwise, we compute
policy gradients that maximize expected rewards, while
using second-order Taylor approximation to evaluate whether
these reward-maximizing gradients would violate the cost
constraint. If constraint violation is detected, we adjust
the gradient direction to maintain safety compliance;
otherwise, we execute standard reward-increasing policy
updates. This approach helps ensure that reward-seeking
updates do not inadvertently increase costs, thereby
reducing the likelihood of constraint violations. Empirical
tests show our framework successfully manages reward-cost
trade-offs through reward augmentation and cost shaping,
improving both performance and safety without switching
optimization strategies. Results demonstrate that
concurrent treatment of both objectives in one policy
gradient update is viable for improving safe reinforcement
learning methods.

CAPO: A Unified Policy Gradient Approach for Reward and Cost Optimization in Safe Reinforcement Learning (Student Abstract)

Offline Zero-Shot Reinforcement Learning requires an agent
to solve unseen tasks using only a fixed offline dataset
without explicit rewards. A central challenge is learning
representations that capture both high-level long-term
planning and low-level physical dynamics. We propose a
novel framework, Dynamics-Aware Planning Representation
(DAPR), which disentangles these two aspects via
complementary contrastive objectives. Specifically, DAPR
learns goal-oriented planning directions and local
dynamics-consistent directions in the latent space. By
jointly enforcing these constraints, DAPR yields
representations that balance “where to go” with “how to
move.” Experiments on standard locomotion benchmarks
(Walker, Cheetah, Quadruped) demonstrate that DAPR
consistently improves performance and generalization over
strong baselines, achieving substantial gains on precision
demanding tasks.

Dynamics-Aware Planning Representation for Zero-Shot Reinforcement Learning (Student Abstract)

The scarcity of parallel corpora for Mongolian and Chinese constrains the performance of Mongolian-Chinese neural machine translation (NMT), particularly manifesting in
inadequate accuracy in translating specialized terminology. To address this limitation, this study adopts a lexically constrained augmentation strategy that constructs pseudo-source
sentences by appending Chinese constraint words to Mongolian source texts, while enforcing the inclusion of these constraints in the output to improve translation accuracy. However, this approach presents two inherent drawbacks: processing pseudo-sentences with a single encoder tends to induce semantic interference, while the introduced constraint words may exacerbate alignment errors during decoding. To overcome these limitations, this paper propose a Constraint-Augmented Mongolian-Chinese NMT method (CANMT) based on dynamic feedback alignment. The method employs a dual-encoder architecture to isolate bilingual representations, coupled with a dynamic feedback alignment module that progressively reduces alignment errors through iterative reffnement, thereby enhancing overall translation performance.

Constraint-Augmented Mongolian-Chinese Neural Machine Translation Based on Dynamic Feedback Alignment (Student Abstract)

Camouflaged object detection is critical for military, defense, and security operations, where targets evade conventional surveillance by mimicking the background or exhibiting low-contrast differences. It also supports non-invasive monitoring of elusive wildlife and endangered species, improving population estimates, habitat management, and biodiversity assessments by recovering objects that are visually indistinguishable from their surroundings. Existing solutions are computationally heavy, with large model parameters and high computational demands, which hinder deployment in real-world applications. Lightweight models have been explored, but they often compromise fine boundary fidelity. This paper introduces a lightweight Laplacian pyramid–based feature extractor network that progressively aggregates multiscale Laplacian features with frequency information. The proposed architecture emphasizes object edge boundaries, enabling precise localization under subtle target–background differences while maintaining realtime efficiency. The design achieves performance comparable to the state of the art (SOTA) convolution based methods on CHAMELEON and NC4K datasets.

LaFINet: Laplacian-Based Frequency Injection Network for Camouflage Object Detection (Student Abstract)

The computational cost of large language models (LLMs) is
a primary obstacle to sustainable deployment. Static
resource
allocation is inefficient, as not all inputs require the
same
depth of processing. We propose a framework for adaptive,
compute-efficient learning via conceptual criticality, which
dynamically tailors computation to the assessed difficulty
of an input. A lightweight criticality prediction module es-
timates conceptual complexity on a continuous scale, and
this score governs the LLM’s inference pathway, selectively
activating token pruning, layer skipping, and quantization.
Simple inputs are processed with minimal FLOPs and la-
tency, while complex inputs use the model’s full capacity
to preserve accuracy. We benchmark our framework and in-
troduce metrics to quantify sensitivity to input criticality
and per-sample computational savings. Results demonstrate
an improved accuracy-efficiency trade-off, paving the way
for more resource-aware systems.

Adaptive Compute Efficient Learning via Conceptual-Criticality (Student Abstract)

The dependency of stock prices on a multitude of factors
makes the task of prediction exceedingly challenging. Given
the volatile nature of stock data, it is imperative to
integrate multiple sources of information to accurately
encompass the various factors that influence market trends.
To capture these complex dynamics, several multimodal
methodologies have been proposed, integrating market data,
technical indicators, and textual information. However, it
is claimed that these coarse-grained information sources do
not offer a holistic view of the market. Furthermore, these
sources are stock-specific and do not elucidate the
interconnections between various stocks. To address this
deficiency, we propose a multimodal approach that
incorporates this relational aspect alongside fine-grained
information sources. The applicability of our framework is
underscored by empirical results, which demonstrate the
superiority of our approach.

An Approach Towards Developing Relationally Intelligent Multimodal Framework for Stock Movement Prediction (Student Abstract)

This work explores Liquid Time-Constant Networks (LTCs) and
Closed-form Continuous-time Networks (CfCs) for modeling
retinal ganglion cell activity in tiger salamanders across
three datasets. Compared to a convolutional baseline and an
LSTM, both architectures achieved lower MAE, faster
convergence, smaller model sizes, and favorable query
times, though with slightly lower Pearson correlation.
Their efficiency and adaptability make them well suited for
scenarios with limited data and frequent retraining, such
as edge deployments in vision prosthetics.

Modeling Retinal Ganglion Cells with Neural Differential Equations (Student Abstract)

Estimating causal effects under network interference is
challenging especially when edges are heterogeneous and
nodes share latent dependencies. We study this realistic
setting and propose MVDR, a targeted maximum likelihood
(TMLE) framework that learns multi-view representations of
covariates and exposure on heterogeneous networks while
achieving double robustness: consistency holds if either
the outcome model or the exposure density is correctly
specified. MVDR supports multiple network interventions
using only the observed network structure. On three
semi-synthetic datasets, MVDR reduces intervention-level
prediction error against baselines, and remains stable
under misspecification.

Content not yet available

Next from AAAI 2026

3D Gaussian Splatting for Reconstructing Large Sparse Environments (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES