Singapore

Evaluating reward models is a fundamental challenge in reinforcement learning, particularly in settings where the reward model is learned or manually designed. The standard paradigm for Reward Model Evaluation (RME) involves training an optimal policy via reinforcement learning (RL) on the given reward model and assessing model quality through the performance of the resulting policy. However, this approach conflates the quality of the reward model with the effectiveness of RL training, and is computationally expensive due to the need for policy optimization. Recent methods attempt to circumvent this issue by evaluating reward models directly, without RL, but often rely on impractical assumptions such as access to a ground-truth reward or fail to utilize available supervision in a fine-grained manner. To overcome these limitations, we propose the Policy Preference Alignment Coefficient (PPAC), a novel metric for RME that requires neither RL training nor ground-truth rewards. PPAC first generates a sequence of automatically ranked policy preferences that guarantee monotonic improvement in the policy value, and then quantifies the alignment between these generated preferences and those implied by the candidate reward model. Experimental results across gridworld and continuous control task demonstrate that PPAC yields preference sequences with consistently increasing policy values and outperforms existing metrics in evaluating reward model quality.

AAAI 2026

Reward Model Evaluation via Automatically-Ranked Policy Alignment

ml: reinforcement learning

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

This talk will introduce knowledge-guided machine learning
(KGML), a rapidly growing field of research where
scientific knowledge is deeply integrated in machine
learning frameworks to produce scientifically grounded,
explainable, and generalizable predictions even on
out-of-distribution data. This talk will present a
multi-dimensional view to organize prior research in KGML
in terms of the nature and format of scientific knowledge
used, the form of knowledge-ML integration explored, and
the method for incorporating scientific knowledge in ML for
diverse scientific use-cases. These KGML concepts will be
illustrated using a variety of case studies in ecology,
biology, and public health including modeling the quality
of water in lakes across the US and discovering novel
biological traits of organisms linked with evolution from
biodiversity images. The talk will conclude with a
discussion of emerging opportunities in KGML especially in
the age of generative AI and Foundation models with
potential applications in a broad range of scientific
disciplines.

Knowledge-Guided Machine Learning: A Paradigm Shift in AI for Science

Fine-grained urban flow inference is crucial for urban planning and intelligent transportation systems, enabling precise traffic management and resource allocation.
However, the practical deployment of existing methods is hindered by two key challenges: the prohibitive computational cost of over-parameterized models and the suboptimal performance of conventional loss functions on the highly skewed distribution of urban flows.
To address these challenges, we propose a unified solution that synergizes architectural efficiency with adaptive optimization.
Specifically, we first introduce **PLGF**, a lightweight yet powerful architecture that employs a Progressive Local-Global Fusion strategy to effectively capture both fine-grained details and global contextual dependencies. 
Second, we propose **DualFocal** Loss, a novel function that integrates dual-space supervision with a difficulty-aware focusing mechanism, enabling the model to adaptively concentrate on hard-to-predict regions. 
Extensive experiments on 4 real-world scenarios validate the effectiveness and scalability of our method. 
Notably, while achieving state-of-the-art performance, PLGF reduces the model size by up to 97\% compared to current high-performing methods. Furthermore, under comparable parameter budgets, our model yields an accuracy improvement of over 10\% against strong baselines.
The implementation is included in the **supplementary material**.

Boosting Fine-Grained Urban Flow Inference via Lightweight Architecture and Focalized Optimization

Recent studies have revealed Neural Collapse (NC) in deep classifiers, where last-layer weights and features align into an equiangular tight frame (ETF), concentrating class information along specific embedding directions. However, conventional fine-tuning typically disregards this structure, initializing task-specific classifier heads randomly. To explicitly leverage this phenomenon, we propose a simple yet effective method for metric learning: (1) initializing the classifier head along each class’s NC direction from a pretrained model to preserve the emergent structure, and (2) injecting small isotropic Gaussian noise during finetuning to boost generalization. In addition, we provide a theoretical bound proving that our method explicitly reduces cumulative weight drift from the NC-initialization, compared to standard finetuning. This suggests that our method better preserves the pretrained model’s class-specific structure. Empirically, this structural preservation yields Recall@K gains: reduced weight drift correlates with better performance. Concurrent decreases in the Neural Collapse 1 (NC1) measure confirm that stronger intra‐class cohesion underlies these improvements. Furthermore, we validate the effectiveness of our method on class‐imbalanced benchmarks.

Neural Collapse-Informed Initialization with Perturbation Injection in Classification-based Metric Learning

Link prediction is a fundamental task in network analysis with widespread applications, from social recommendation to knowledge graph completion. Fairness in this setting is critical, as biased predictions can propagate or exacerbate societal inequalities. Prior work adopts a dyadic perspective, enforcing fairness through demographic parity between intra-group and inter-group link predictions. However, this dyadic framing can obscure underlying disparities across subgroups, allowing systemic biases to go undetected. Moreover, we argue that demographic parity does not meet desired properties for fairness assessment in ranking-based tasks such as link prediction. We formalize the limitations of existing fairness evaluations and borrow a framework inspired by information retrieval that enables a more expressive assessment, addressing these limitations. Additionally, we propose a lightweight post-processing method combined with decoupled link predictors that effectively mitigates bias and achieves state-of-the-art fairness–utility trade-offs.

Breaking the Dyadic Barrier: Rethinking Fairness in Link Prediction Beyond Demographic Parity

Vision-Language Retrieval (VLR) aims to retrieve relevant visual or textual information from multimodal data using language or image queries. However, traditional VLR methods often rely on data-driven shallow semantic alignment and fail to understand the deeper structural and fine-grained entity features of queries, resulting in poor performance on multi-entity layouts and challenging entities. In this paper, we propose the Layout-Aware and Sketch-Enhanced (LASE) VLR framework, which refines query representations by incorporating multimodal layout and sketch knowledge. Specifically, layout knowledge encodes the spatial arrangement of entities, while sketch knowledge refines entity perception by capturing essential structural details. To extract these knowledge representations, we leverage Large Language Models' (LLMs) powerful semantic understanding for layout generation, and Diffusion Models' (DMs) fine-grained cross-modal generative capabilities for sketch generation. However, integrating knowledge into queries may introduce biases and query-specific preferences due to varying visual content and knowledge demands. To address this, we propose the Gated Dual-Stream Knowledge Module (GDKM), which consists of a multi-instance fusion network with a sample-aware gating network. The fusion network aggregates diverse knowledge using multi-head attention to reduce bias, while the gating network adjusts knowledge weights based on query characteristics. Extensive experiments demonstrate that the LASE significantly enhances VLR performance across multiple benchmarks, with superior generalization and transferability.

Imagine with Layout and Sketch: Enhancing Vision-Language Retrieval with Dual-Stream Multi-Modal Query Refinement

Leveraging vast amounts of unlabeled internet video data for embodied AI is currently bottlenecked by the lack of action labels and the presence of action-correlated visual distractors. Although recent latent action policy optimization (LAPO) has shown promise in inferring proxy action labels from visual observations, its performance degrades significantly when distractors are present. To address this limitation, we propose a novel object-centric latent action learning framework that centers on objects rather than pixels. We leverage self-supervised object-centric pretraining to disentangle the movement of the agent and distracting background dynamics. This allows LAPO to focus on task-relevant interactions, resulting in more robust proxy-action labels, enabling better imitation learning and efficient adaptation of the agent with just a few action-labeled trajectories. We evaluated our method in eight visually complex tasks across the Distracting Control Suite (DCS) and Distracting MetaWorld (DMW). Our results show that object-centric pretraining mitigates the negative effects of distractors by 50%, as measured by downstream task performance: average return (DCS) and success rate (DMW).

Object-Centric Latent Action Learning

Large language models (LLMs) are widely adopted across diverse AI applications.
To align LLM behavior with human values, Reinforcement Learning from Human Feedback (RLHF) employs a reward model (RM) as a proxy for human preferences to guide policy optimization.
Consequently, the accuracy, reliability, and interpretability of the RM critically influence downstream alignment outcomes.
However, conventional scalar RMs are both opaque and rigid, offering little insight into reward reasoning and lacking adaptability to evolving preferences.
While recent work on multidimensional RMs has sought to improve interpretability, these methods often fall short in feature-level attribution and incur substantial annotation costs.
To address these challenges, we propose the Sparse Autoencoder-enhanced Reward Model (\textbf{SARM}), a novel architecture that integrates a pretrained Sparse Autoencoder (SAE) into the reward modeling pipeline.
Specifically, SARM projects LLM hidden activations into a sparse monosemantic feature space, with a scalar head aggregating these features to produce reward scores attributable to interpretable concepts.
Experiments demonstrate that SARM enables direct attribution of reward scores to interpretable feature activations, supports dynamic preference adjustment, and outperforms standard scalar RMs in alignment tasks.

Interpretable Reward Model via Sparse Autoencoder

With the widespread deployment of large language models (LLMs) in human-computer interaction, dark patterns have extended from traditional visual interfaces to conversational AI systems. While existing research has confirmed the prevalence of dark patterns in LLMs, current evaluation benchmarks face critical challenges including limited classification coverage, overlooked risks specific to reasoning models, and inadequate consideration of cross-linguistic differences. To address these limitations, we propose DarkBench+, an extended benchmark for evaluating dark patterns in LLMs. We construct an expanded taxonomy containing 10 major categories and 24 subcategories, introduce an annotation workflow combining manual and automated methods, and design 2,088 bilingual test samples in Chinese and English. This benchmark is the first to develop specialized evaluation dimensions for reasoning models and systematically evaluates dark pattern behaviors across nearly 40 mainstream LLMs. Experimental results demonstrate significant manipulation risks in reasoning models' transparency displays, while cross-linguistic evaluation analyzes AI manipulation behavior differences across different linguistic environments, promoting more ethical and responsible LLM development.

DarkBench+: An Extended Benchmark for Evaluating Dark Patterns in Large Language Models

This paper presents the first AI/ML system for automating building damage assessment in uncrewed aerial systems (sUAS) imagery to be deployed operationally during federally declared disasters (Hurricanes Debby and Helene). In response to major disasters, sUAS teams are dispatched to collect imagery of the affected areas to assess damage; however, at recent disasters, teams collectively delivered between
47GB and 369GB of imagery per day, representing more imagery than can reasonably be transmitted or interpreted by subject matter experts in the disaster scene, thus delaying response efforts. To alleviate this data avalanche encountered in practice, computer vision and machine learning techniques are necessary. While prior work has been deployed to automatically assess damage in satellite imagery, there is no current state of practice for sUAS-based damage assessment systems for operational use, as all known work has been confined 
to academic settings. This work establishes the state of practice via the development and deployment of models for building damage assessment with sUAS imagery. The development of the models consisted of training on the largest known dataset of post-disaster sUAS aerial imagery, which consists of 21,716 building damage labels, and the operational training of 91 disaster practitioners. The deployment of the system was during the responses to Hurricanes Debby and Helene, where it assessed a combined 415 buildings in approximately 18 minutes. This work contributes detailed documentation of the actual use of AI/ML for damage assessment during a disaster and lessons learned to the benefit of the AI/ML research and user communities.

Deploying Rapid Damage Assessments from sUAS Imagery for Disaster Response

Deep learning models are designed based on the i.i.d.
assumption; consequently, they experience a significant
performance drop due to the distribution shifts when
deployed in real environments. Domain Generalisation (DG)
aims to bridge the distribution shift between the source
and target domains by improving the generalisability of the
model to Out-Of-Distribution (OOD) data. This challenge is
prominent in satellite imagery classification due to the
scarcity of data from underrepresented regions such as
Africa and Oceania. In this paper, we address the
limitations of existing datasets in capturing distribution
shifts caused by geospatial differences between geographic
regions by constructing a new, large-scale dataset called
Domain Shift across Geographic Regions (DSGR). This dataset
aims to help researchers better understand the impact of
distribution shifts on satellite imagery classification.
Furthermore, we perform rigorous experiments on DSGR to
investigate and benchmark the robustness of existing DG
techniques under single- and multi-source domain settings
and the role of foundation models in enhancing the DG
techniques. Our evaluations reveal that recent DG
techniques have a comparable, yet weak, performance on
DSGR. However, when combined with a foundation model like
CLIP, ERM (introduced in 1999) achieves highly competitive
results, surpassing even recent state-of-the-art DG
solutions in enhancing the generalisability of deep
learning models across different geographic regions. Our
dataset and code are available at
https://github.com/RWGAI/DSGR.



Downloads

Next from AAAI 2026

Knowledge-Guided Machine Learning: A Paradigm Shift in AI for Science

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES