Singapore

External reasoning systems combine language models with process reward models (PRMs) to select high-quality reasoning paths for complex tasks such as mathematical problem solving. However, these systems are prone to reward hacking, where high-scoring but logically incorrect paths are assigned high scores by the PRMs, leading to incorrect answers. From a causal inference perspective, we attribute this phenomenon primarily to the presence of confounding semantic features. To address it, we propose Causal Reward Adjustment (CRA), a method that mitigates reward hacking by estimating the true reward of a reasoning path. CRA trains sparse autoencoders on the PRM’s internal activations to recover interpretable features, then corrects confounding by using backdoor adjustment. Experiments on math solving datasets demonstrate that CRA mitigates reward hacking and improves final accuracy, without modifying the policy model or retraining PRM.

AAAI 2026

Causal Reward Adjustment: Mitigating Reward Hacking in External Reasoning via Backdoor Correction

(large) language models; safety and robustness; action

and causality

change

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Detecting the origin of information or infection spread in networks is a fundamental challenge with applications in misinformation tracking, epidemiology, and beyond. We study the multi-source detection problem: given snapshot observations of node infection status on a graph, estimate the set of source nodes that initiated the propagation. Existing methods either lack statistical guarantees or are limited to specific diffusion models and assumptions. We propose a novel conformal prediction framework that provides statistically valid recall guarantees for source set detection, independent of the underlying diffusion process or data distribution. Our approach introduces principled score functions to quantify the alignment between predicted probabilities and true sources, and leverages a calibration set to construct prediction sets with user-specified recall and coverage levels. The method is applicable to both single- and multi-source scenarios, supports general network diffusion dynamics, and is computationally efficient for large graphs. Empirical results demonstrate that our method achieves rigorous coverage with competitive accuracy, outperforming existing baselines in both reliability and scalability.

Conformal Prediction for Multi-Source Detection on a Network

Learning to manipulate diverse objects with multi-finger dexterous hands remains a significant challenge in robotics. Human-Object Interaction datasets constitute a rich repository of knowledge about task information and embodied interactions. Instead of solely imitating the human demonstrations, we consider the hand-object interaction process as a whole by predicting the hand-object future states holistically. The predicted object future states can not only facilitate the reinforcement learning by alleviating the heavy reliance on task-specific reward design, but also enable our pipeline to be more general to various task settings. We conduct extensive robot experiments across 3 challenging tasks with novel objects. Results demonstrate that our methods outperform existing SOTA methods in all 3 tasks with higher success rates and better adaptive ability to novel object configurations. We also validate the cross-embodiment compatibility of our methods on different robots to prove the learned priors' universal utility.

Learning Object-Centric Motion Priors from Human for Robotic Dexterous Manipulation

Downstream fine-tuning of multi-modal large language models (MLLMs) is advancing rapidly, allowing general models to achieve superior performance on domain-specific tasks. Yet most prior research focuses on performance gains and overlooks the vulnerability of the fine-tuning pipeline: attackers can easily poison the dataset to implant backdoors into MLLMs. We conduct an in-depth investigation of backdoor attacks on MLLMs and reveal the phenomenon of **Attention Hijacking** and its **Hierarchical Mechanism**. Guided by this insight, we propose **PurMM**, a **test-time backdoor purification** framework that removes visual tokens exhibiting anomalous attention, thereby avoiding targeted outputs while restoring correct answers. PurMM contains three stages: (1) locating tokens with abnormal attention, (2) filtering them using deep-layer cues, and (3) zeroing out their corresponding components in the visual embeddings. Unlike existing defences, PurMM dispenses with retraining and training-process modifications, operating at test-time to restore model performance while eliminating the backdoor. Extensive experiments across multiple MLLMs and datasets show that PurMM maintains normal performance, sharply reduces attack success rates, and consistently converts backdoor outputs to benign ones, offering a new perspective for safeguarding MLLMs.

PurMM: Attention-Guided Test-Time Backdoor Purification in Multimodal Large Language Models

Lifelong person re-identification (LReID) aims to retrieve the target person from sequentially collected data. Due to significant domain gaps between datasets and the continuous increase of training data from different scenarios, weak inter-domain generalization and catastrophic forgetting issues have remained major challenges for LReID. To tackle these issues, a novel LReID method called Unified Representation Causal Prompt Distillation (URCPD) is proposed. Specifically, to reduce domain gaps among different scene datasets and improve model inter-domain generalization capability, a Feature Decoupling Style Transfer module (FDST) is proposed to map new features into a unified feature space. Furthermore, to reduce the accumulated forgetting of old knowledge during the training stage, a Causal Prompt Distillation module (CPD) is introduced. This module eliminates the re-inference process for distillation and embeds memory prompts to combat catastrophic forgetting. Extensive experiments on five classic LReID seen datasets and seven unseen datasets demonstrate that our method significantly outperforms state-of-the-art methods.

Unified Representation Causal Prompt Distillation for Re-Inference-Free Lifelong Person Re-Identification

In recent years, learning-based underwater image enhancement (UIE) techniques have rapidly evolved. However, distribution shifts between high-quality enhanced outputs and natural images can hinder semantic cue extraction for downstream vision tasks, thereby limiting the adaptability of existing enhancement models. To address this challenge, this work proposes a new learning mechanism that leverages Vision-Language Models (VLMs) to empower UIE models with semantic-sensitive capabilities. To be concrete, our strategy first generates textual descriptions of key objects from a degraded image via a VLM. Subsequently, a text-image alignment model remaps these relevant descriptions back onto the image to produce a spatial semantic guidance map. This map then steers the UIE network through a dual-guidance mechanism, which combines cross-attention and an explicit alignment loss. This forces the network to focus its restorative power on semantic-sensitive regions during image reconstruction, rather than pursuing a globally uniform improvement, thereby ensuring the faithful restoration of key object features. Experiments confirm that when our strategy is applied to different UIE baselines, significantly boosts their performance on perceptual quality metrics as well as enhances their performance on detection and segmentation tasks, validating its effectiveness and adaptability.

Empowering Semantic-Sensitive Underwater Image Enhancement with VLM

This paper addresses the nonparametric estimation of the drift function over a compact domain for a time-homogeneous diffusion process, based on high-frequency discrete observations from $N$ independent trajectories. We propose a neural network-based estimator and derive a non-asymptotic convergence rate, decomposed into a training error, an approximation error, and a diffusion-related term scaling as ${\log N}/{N}$. For compositional drift functions, we establish an explicit rate. In the numerical experiments, we consider a drift function with local fluctuations generated by a double-layer structure and show that the empirical convergence rate becomes independent of the input dimension $d$. Compared to the $B$-spline method proposed by Denis et al., the neural network estimator achieves better convergence rates and more effectively captures local features, particularly in higher-dimensional settings.

Drift Estimation for Diffusion Processes Using Neural Networks Based on Discretely Observed Independent Paths

Deep Learning techniques are nowadays pervasive in AI. However, these approaches suffer of the lack of transparency for justifying their output and for helping users in believing in their decisions. For these reasons alternative approaches to learning deserve to be explored either for developing new tools with autonomous learning capability or for explaining the results of black-box predictors.
Among them an important role is assumed since the Nineties by Inductive Logic Programming and, in particular, recently by the approaches of Learning from Answer Sets (LAS). 
Computing inductive solutions for LAS tasks is known to be $\Sigma_2^P$-hard. In this work, we tackle this problem using a single-shot disjunctive ASP encoding based on the saturation technique originally proposed by Eiter and Gottlob. We prove that, when the background knowledge and hypothesis space form a tight program (a syntactical property) our encoding is linear in the size of the task. This approach contrasts with the state-of-the-art ILASP system, which relies on multiple iterative calls to an ASP solver. As a result, it can be directly evaluated by modern disjunctive ASP solvers, leveraging decades of research and optimization in the ASP community. 
We implement our method in a system named LASCO. Experimental results on a diverse set of benchmarks demonstrate that LASCO outperforms all versions of ILASP on many instances and it scales if run on multi-threaded machines.

Learning from Answer Sets via Single-Shot Disjunctive ASP Encoding

The spatial reasoning task aims to reason about the spatial relationships in 2D and 3D space, which is a fundamental capability for Visual Question Answering (VQA) and robotics. Although vision language models (VLMs) have developed rapidly in recent years, they are still struggling with the spatial reasoning task. In this paper, we introduce a method that can enhance Spatial reasoning through Visual and Textual thinking Simultaneously (SpatialVTS). In the spatial visual thinking phase, our model is trained to generate location-related specific tokens of important targets automatically. Not only are the objects mentioned in the problem addressed, but also the potential objects related to the reasoning are considered. During the spatial textual thinking phase, our model conducts long-term thinking based on visual cues and dialogues and gradually inferences the answers to spatial reasoning problems. To effectively support the model's training, we made manual corrections to the existing spatial reasoning dataset, eliminating numerous incorrect labels resulting from automatic annotation, restructuring the data input format to enhance generalization, and developing a reasoning framework for model thinking. Without introducing any additional information (such as masks or depth), our model's overall average level in several spatial understanding tasks has significantly improved compared with other models.

Enhancing Spatial Reasoning Through Visual and Textual Thinking

This paper bridges two perspectives: it studies the multi-secretary problem through the fairness lens of social choice, and examines multi-winner elections from the viewpoint of online decision making. After identifying the limitations of the prominent proportionality notion of Extended Justified Representation (EJR) in the online domain, the work proposes a set of mechanisms that merge techniques from online algorithms with rules from social choice---such as the Method of Equal Shares and the Nash Rule---and supports them through both theoretical analysis and extensive experimental evaluation.

Fairness in the Multi-Secretary Problem

Neural coupling is a fundamental mechanism in neuroscience that facilitates the emergence of cognitive functions through dynamic interactions and synchronization among distributed brain regions. Inspired by this principle, we pose the question: Might the biological mechanism of neural oscillatory synchronization inspire the feature representation learning for neuroscience? By addressing this question through the Kuramoto model, renowned for simulating oscillatory dynamics, we present a novel physics-informed deep model, `SyncBrain`, it models brain regions as interacting oscillatory units and simulates their temporal dynamics and synchronization patterns to distinguish cognitive states. Furthermore, inspired by the brain's inherent ability to dynamically attend to critical temporal information, we incorporate an adaptive control module that introduces an attention-like mechanism to guide information flow. We evaluate our model on multiple functional neuroimaging datasets, it demonstrates promising performance and enhanced interpretability in both cognitive state decoding and early disease diagnosis, outperforming existing computational methods. These results demonstrate the effectiveness of neural oscillatory mechanisms in shaping robust and interpretable machine learning models for neuroscience applications.

Downloads

Next from AAAI 2026

Conformal Prediction for Multi-Source Detection on a Network

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES