Singapore

In safe reinforcement learning (SRL), there exists an
inherent conflict between maximizing reward and minimizing
cost. We propose a novel approach that effectively resolve
the conflict between maximizing reward and minimizing cost
in joint optimization.When the cost exceeds the threshold,
we perform cost-reducing updates. Otherwise, we compute
policy gradients that maximize expected rewards, while
using second-order Taylor approximation to evaluate whether
these reward-maximizing gradients would violate the cost
constraint. If constraint violation is detected, we adjust
the gradient direction to maintain safety compliance;
otherwise, we execute standard reward-increasing policy
updates. This approach helps ensure that reward-seeking
updates do not inadvertently increase costs, thereby
reducing the likelihood of constraint violations. Empirical
tests show our framework successfully manages reward-cost
trade-offs through reward augmentation and cost shaping,
improving both performance and safety without switching
optimization strategies. Results demonstrate that
concurrent treatment of both objectives in one policy
gradient update is viable for improving safe reinforcement
learning methods.

AAAI 2026

CAPO: A Unified Policy Gradient Approach for Reward and Cost Optimization in Safe Reinforcement Learning (Student Abstract)

taylor approximation

safe reinforcement learning

policy gradient

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Offline Zero-Shot Reinforcement Learning requires an agent
to solve unseen tasks using only a fixed offline dataset
without explicit rewards. A central challenge is learning
representations that capture both high-level long-term
planning and low-level physical dynamics. We propose a
novel framework, Dynamics-Aware Planning Representation
(DAPR), which disentangles these two aspects via
complementary contrastive objectives. Specifically, DAPR
learns goal-oriented planning directions and local
dynamics-consistent directions in the latent space. By
jointly enforcing these constraints, DAPR yields
representations that balance “where to go” with “how to
move.” Experiments on standard locomotion benchmarks
(Walker, Cheetah, Quadruped) demonstrate that DAPR
consistently improves performance and generalization over
strong baselines, achieving substantial gains on precision
demanding tasks.

Dynamics-Aware Planning Representation for Zero-Shot Reinforcement Learning (Student Abstract)

The scarcity of parallel corpora for Mongolian and Chinese constrains the performance of Mongolian-Chinese neural machine translation (NMT), particularly manifesting in
inadequate accuracy in translating specialized terminology. To address this limitation, this study adopts a lexically constrained augmentation strategy that constructs pseudo-source
sentences by appending Chinese constraint words to Mongolian source texts, while enforcing the inclusion of these constraints in the output to improve translation accuracy. However, this approach presents two inherent drawbacks: processing pseudo-sentences with a single encoder tends to induce semantic interference, while the introduced constraint words may exacerbate alignment errors during decoding. To overcome these limitations, this paper propose a Constraint-Augmented Mongolian-Chinese NMT method (CANMT) based on dynamic feedback alignment. The method employs a dual-encoder architecture to isolate bilingual representations, coupled with a dynamic feedback alignment module that progressively reduces alignment errors through iterative reffnement, thereby enhancing overall translation performance.

Constraint-Augmented Mongolian-Chinese Neural Machine Translation Based on Dynamic Feedback Alignment (Student Abstract)

Camouflaged object detection is critical for military, defense, and security operations, where targets evade conventional surveillance by mimicking the background or exhibiting low-contrast differences. It also supports non-invasive monitoring of elusive wildlife and endangered species, improving population estimates, habitat management, and biodiversity assessments by recovering objects that are visually indistinguishable from their surroundings. Existing solutions are computationally heavy, with large model parameters and high computational demands, which hinder deployment in real-world applications. Lightweight models have been explored, but they often compromise fine boundary fidelity. This paper introduces a lightweight Laplacian pyramid–based feature extractor network that progressively aggregates multiscale Laplacian features with frequency information. The proposed architecture emphasizes object edge boundaries, enabling precise localization under subtle target–background differences while maintaining realtime efficiency. The design achieves performance comparable to the state of the art (SOTA) convolution based methods on CHAMELEON and NC4K datasets.

LaFINet: Laplacian-Based Frequency Injection Network for Camouflage Object Detection (Student Abstract)

The computational cost of large language models (LLMs) is
a primary obstacle to sustainable deployment. Static
resource
allocation is inefficient, as not all inputs require the
same
depth of processing. We propose a framework for adaptive,
compute-efficient learning via conceptual criticality, which
dynamically tailors computation to the assessed difficulty
of an input. A lightweight criticality prediction module es-
timates conceptual complexity on a continuous scale, and
this score governs the LLM’s inference pathway, selectively
activating token pruning, layer skipping, and quantization.
Simple inputs are processed with minimal FLOPs and la-
tency, while complex inputs use the model’s full capacity
to preserve accuracy. We benchmark our framework and in-
troduce metrics to quantify sensitivity to input criticality
and per-sample computational savings. Results demonstrate
an improved accuracy-efficiency trade-off, paving the way
for more resource-aware systems.

Adaptive Compute Efficient Learning via Conceptual-Criticality (Student Abstract)

The dependency of stock prices on a multitude of factors
makes the task of prediction exceedingly challenging. Given
the volatile nature of stock data, it is imperative to
integrate multiple sources of information to accurately
encompass the various factors that influence market trends.
To capture these complex dynamics, several multimodal
methodologies have been proposed, integrating market data,
technical indicators, and textual information. However, it
is claimed that these coarse-grained information sources do
not offer a holistic view of the market. Furthermore, these
sources are stock-specific and do not elucidate the
interconnections between various stocks. To address this
deficiency, we propose a multimodal approach that
incorporates this relational aspect alongside fine-grained
information sources. The applicability of our framework is
underscored by empirical results, which demonstrate the
superiority of our approach.

An Approach Towards Developing Relationally Intelligent Multimodal Framework for Stock Movement Prediction (Student Abstract)

This work explores Liquid Time-Constant Networks (LTCs) and
Closed-form Continuous-time Networks (CfCs) for modeling
retinal ganglion cell activity in tiger salamanders across
three datasets. Compared to a convolutional baseline and an
LSTM, both architectures achieved lower MAE, faster
convergence, smaller model sizes, and favorable query
times, though with slightly lower Pearson correlation.
Their efficiency and adaptability make them well suited for
scenarios with limited data and frequent retraining, such
as edge deployments in vision prosthetics.

Modeling Retinal Ganglion Cells with Neural Differential Equations (Student Abstract)

Estimating causal effects under network interference is
challenging especially when edges are heterogeneous and
nodes share latent dependencies. We study this realistic
setting and propose MVDR, a targeted maximum likelihood
(TMLE) framework that learns multi-view representations of
covariates and exposure on heterogeneous networks while
achieving double robustness: consistency holds if either
the outcome model or the exposure density is correctly
specified. MVDR supports multiple network interventions
using only the observed network structure. On three
semi-synthetic datasets, MVDR reduces intervention-level
prediction error against baselines, and remains stable
under misspecification.

Doubly Robust Causal Estimation Under Multi-View Network Interference (Student Abstract)

Dataset distillation methods learn a representative summary
of the full dataset such that training on the distilled
data is
more efficient in terms of time and space. The current
state-of-the-art methods exploit the correspondence between
infinitely wide neural networks (NNs) and kernel ridge
regression to design distillation methods that result in
high-quality summaries of the data. In this work, we
leverage the correspondence between infinitely wide
networks and Gaussian Processes(GPs) for learning a
distilled dataset. We investigate the feasibility of using
the inducing points method for Gaussian Processes, as a
data distillation method. While most of the existing
dataset distillation methods are based on loss or gradient
matching, our method looks at the function space
approximation, facilitated by the NN-GP correspondence.
Additionally, using recent theoretical results on GP
regression and neural tangent kernels(NTKs), we also
provide an upper bound on the size of the distilled data.
We demonstrate the utility of inducing points as distilled
data on a set of datasets empirically.

How Good Are Inducing Points for Dataset Distillation? (Student Abstract)

Traditional intercultural communication training often lacks safe spaces for open practice, leading to self-censorship and limited skill development. The ICC Tutor, an AI-powered
conversational system, addresses this by offering a private, nonjudgmental environment for reflection and dialog. Using retrieval-augmented generation (RAG), the system grounds its prompts and feedback in course materials. We conducted a mixed-methods study (N = 25) with Beginner/Intermediate and expert learners. Preliminary findings suggest that the tutor helped reduce feelings of nervousness. While many beginners reported increased confidence in intercultural communication, expert learners’ confidence temporarily decreased,
suggesting the AI’s role in fostering deeper self-reflection rather than just boosting perceived competence. These findings underscore the potential of AI tutors in supporting communication education and highlight the need for experience-adaptive designs to support nuanced learning trajectories.

Adaptive AI for Personalized Intercultural Communication Education: A Conversational Agent Powered by Retrieval-Augmented Generation (Student Abstract)

In this paper, we study the adversarial robustness of deep
neural networks (DNN) for classification against optimal
classifiers. We look at the smallest magnitude of possible
additive perturbations that can change a classifier's
output. We provide a matrix-theoretic explanation of the
adversarial fragility of DNNs for classification. In
particular, our theoretical results show that the
adversarial robustness of a neural network can degrade as
the input dimension d increases. Analytically, we show
that the adversarial robustness of neural networks can be
only 1/√d of the best possible adversarial
robustness of optimal classifiers. Our theories match
remarkably well with empirical results. The
matrix-theoretic explanation aligns with an earlier
information-theoretic feature-compression-based explanation
for the adversarial fragility of neural networks.

Downloads

Next from AAAI 2026

Dynamics-Aware Planning Representation for Zero-Shot Reinforcement Learning (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Dynamics-Aware Planning Representation for Zero-Shot Reinforcement Learning (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads