Singapore

Alignment of large language models (LLMs) with human preferences typically relies on supervised reward models or external judges, which in turn require abundant preference data. We propose a generative preference modeling approach for low-resource and domain-specific scenarios, reframing preference learning as an inverse reinforcement learning problem. Instead of training a discriminative reward model, we train the LLM itself to infer and maximize an implicit reward function underlying high-quality reasoning. Specifically, we leverage Chain-of-Thought (CoT) sampling to generate diverse candidate solutions for each query and refine fine-grained preferences from these without additional human labels. We also introduce an entropy-guided token scoring mechanism to rank and weight the sampled CoTs, boosting the importance of high-confidence answers and strategically high-entropy tokens. Building on this, we train the model with our Self-Evaluated Group Advantage (SEGA) algorithm. Compared with other methods, this algorithm efficiently utilizes the fine-grained preference information in group candidate solutions to update the strategy. Our method eliminates dependence on external judges or reward classifiers, instead relying on the generative model’s own judgments. Experiments on general benchmarks and domain-specific tasks—such as mathematical reasoning and medical question answering—demonstrate that our generative preference model achieves significant improvements with limited data.

AAAI 2026

GEM: Generative Entropy-Guided Preference Modeling for Few-Shot Alignment of LLMs

nlp: (large) language models

ml: reinforcement learning

ml: calibration & uncertainty quantification

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Large language models (LLMs) demonstrate strong reasoning
capabilities, yet the inference-time performance of
existing solutions remains limited by self-biases,
coordination inefficiencies, lack of robust error
detection, and dependency on high-quality verifiers. To
address these challenges, we propose Adaptive Coopetition
(AdCo), a lightweight, multi-agent multi-round
inference-time framework that enhances collective reasoning
through adaptive decision-making guided by coarse verifier
signals. Without relying on high-performance verifiers,
AdCo achieves a 20% relative accuracy improvement on math
reasoning benchmarks, with consistent performance on
different sample sizes and agent configurations. This
adaptive, signal-guided ‘coopetition’ framework (Tran et
al. 2025) enhances reasoning robustness by leveraging
diverse model knowledge and reasoning traces, while also
promoting uncertainty-driven exploration, especially when
participants have comparable capabilities.

Adaptive Coopetition: Leveraging Coarse Verifier Signals for Resilient Multi-Agent LLM Reasoning (Student Abstract)

This study presents a physics-informed neural network
framework to model droplet spreading dynamics on
unstructured rough surfaces, using training data generated
from high-fidelity lattice Boltzmann method simulations.
The droplet is represented by phase-field density field,
where interface evolution is governed by multiphase flow
physics, surface tension, and wall wettability effects.
Unlike conventional data-driven models, the PINN
incorporates the underlying physi-
cal laws such as mass and momentum conservation directly
into the loss function, enabling accurate prediction even
with the sparse data. The trained model successfully
captures the droplet dynamics over different time periods,
contact line evolution, and interfacial deformation across
time, showing high accuracy as compared with the CNNs. This
PINN-based surrogate offers a comutationally efficient,
mesh-free alternatives to traditional solvers, making it
ideal for rapid parametric studies and design optimized
microfluidic devices and wettability-controlled systems.

Learning Droplet Dynamics on Rough Unstructured Surfaces Using Physics-Informed Neural Networks (Student Abstract)

Agentic workflows, where multiple AI agents collaborate to
accomplish complex tasks like reasoning or planning, play a
substantial role in many cutting-edge commercial
applications. These workflows depend critically on the
prompts used to provide the roles models play in such
workflows. Poorly designed prompts that fail even slightly
to guide individual agents can lead to sub-optimal
performance that may snowball within a system of agents,
limiting their reliability and scalability. To address this
important problem of inference-time prompt optimization, we
introduce ProRefine, an innovative inference-time
optimization method that uses an agentic loop of LLMs to
generate and apply textual feedback. ProRefine dynamically
refines prompts for multi-step reasoning tasks without
additional training or ground truth labels. Evaluated on
five benchmark mathematical reasoning datasets, ProRefine
significantly surpasses zero-shot Chain-of-Thought
baselines by 3 to 37 percentage points. This approach not
only boosts accuracy but also allows smaller models to
approach the performance of their larger counterparts. This
highlights its potential for building cost-effective and
powerful hybrid AI systems, thereby democratizing access to
high-performing AI.

ProRefine: Inference-Time Prompt Refinement with Textual Feedback (Student Abstract)

Quantitative remote sensing estimation is critical for environmental monitoring, providing continuous measures of vegetation indices, canopy height, and carbon stock. Traditional radiative-transfer models and empirical regressions require expert knowledge and generalize poorly, while deep learning methods remain task-specific. We propose SatelliteCalculator+, a DINOv3-powered multi-task foundation model for continuous regression of spectral and structural variables. The framework combines prompt-driven cross-attentive adapters with lightweight MLP decoders, enabling efficient dense prediction from frozen features. To overcome limited supervision, we synthesize over one million paired samples from SPOT 6/7 imagery using physically defined formulas. On the Open-Canopy dataset, SatelliteCalculator+ achieves competitive accuracy across eight ecological variables while reducing inference cost, demonstrating the promise of self-supervised transformers and scalable multi-task learning for large-scale Earth observation.

DINOv3-Powered Multi-Task Foundation Model for Quantitative Remote Sensing Estimation (Student Abstract)

We present a language-based noise modulation module for diffusion models that improves image color generation under textual guidance. Unlike standard approaches that inject noise uniformly, our method leverages semantic cues from text to selectively control the noise injection process, preserving local details and enhancing color accuracy even when descriptions are ambiguous or incomplete. Applied to language guided image colorization, this targeted modulation leads to more faithful and visually consistent results. The proposed module is lightweight, generalizable, and can be integrated into existing diffusion pipelines, offering a simple yet effective step toward more controllable text-to-image generation.

NoMoColor: Unified Noise Modulation for Enhanced Diffusion-based Image Colorization (Student Abstract)

Large Language Models (LLMs) demonstrate strong
capabilities in code generation but often lack adaptability
in planning and refinement. We propose Self-PR, a framework
that integrates adaptive plan selection and iterative
repair to improve correctness and generalization. Self-PR
constructs a reusable plan database via task clustering and
trains a selector to choose task-specific strategies.
Incorrect outputs are refined through multi-round feedback
until correctness. Trained only on HumanEval, Self-PR
generalizes well to out-of-distribution tasks (MBPP),
improving pass@1 by +4.9\% on HumanEval and +5.5\% on MBPP
compared to Modularization-of-Thought prompting.
Experiments across Llama-3 (8B, 70B) and GPT-4o-mini
confirm robustness and scalability. These findings suggest
that adaptive planning and feedback-driven repair are
essential for reliable LLM-based code generation.

Self-Guided Planning and Repair Framework for Code Generation (Student Abstract)

In this study, we propose two methods to estimate static
graphs from a single dynamic graph and integrate them into
hybrid Graph Neural Networks (GNNs), which combine
long-term static structure with transient dynamic
interactions. Since static graphs are often unavailable and
attributes may be difficult to use at scale or under
privacy constraints, we introduce: (i) a behavioral
similarity estimator based on normalized co-occurrence,
requiring no attributes, and (ii) an attribute-aware
K-means + k-NN estimator that is more efficient than cosine
similarity. Experiments on multiple real-world datasets
show that both methods consistently improve predictive
accuracy and training efficiency, underscoring the
importance of static graph choice in hybrid GNNs.

Behavioral-Similarity and Clustering-Based Methods for Static Graph Estimation in Hybrid GNNs (Student Abstract)

Dynamic head pruning in Vision Transformers (ViTs) improves
efficiency by removing redundant attention heads, but
existing pruning policies are often difficult to interpret
and control. In this work, we propose a novel framework by
integrating Sparse Autoencoders (SAEs) with dynamic
pruning, leveraging their ability to disentangle dense
embeddings into interpretable and controllable sparse
latents. Specifically, we train an SAE on the final-layer
residual embedding of the ViT and amplify the sparse
latents with different strategies to alter pruning
decisions. Among them, per-class steering reveals compact,
class-specific head subsets that preserve accuracy. For
example, bowl improves accuracy (76%→82%) while reducing
head usage (0.72→0.33) via heads h2 and h5. These results
show that sparse latent features enable class-specific
control of dynamic pruning, effectively bridging pruning
efficiency and mechanistic interpretability in ViTs.

Steering Sparse Autoencoder Latents to Control Dynamic Head Pruning in Vision Transformers (Student Abstract)

Agents are rapidly advancing in automating digital work, but enterprises face a harder challenge: moving beyond prototypes to deployed systems that deliver measurable business value. This path is complicated by fragmented frameworks, slow development, and the absence of standardized evaluation practices. Generalist agents have emerged as a promising direction, excelling on academic benchmarks and offering flexibility across task types, applications, and modalities. Yet, evidence of their deployment in production enterprise settings is still limited. This paper reports IBM’s experience developing and deploying the \textbf{Computer Using Generalist Agent (CUGA)}. CUGA adopts a hierarchical planner--executor architecture with strong analytical foundations, achieving state-of-the-art performance on AppWorld and WebArena. Beyond benchmarks, it was deployed in the Business-Process-Outsourcing talent acquisition domain, meeting enterprise requirements for scalability, auditability, safety, and governance. To support evaluation, we introduce \textbf{BPO-TA}, a 26-task benchmark spanning 13 analytics endpoints. In deployment, CUGA matched the accuracy of specialized agents while reducing development time by 91.9\% and cost by 52.3\%. Our contribution is twofold: demonstrating generalist agents operating at enterprise scale, and distilling technical and organizational lessons from this deployment. We outline requirements and next steps for advancing research-grade architectures like CUGA into robust, enterprise-ready systems.

From Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production

Explanation generation frameworks aim to make AI systems’ decisions transparent and understandable to human users. However, generating explanations in uncertain environments characterized by incomplete information and probabilistic models remains a significant challenge. In this paper, we propose a novel framework for generating probabilistic monolithic explanations and model reconciling explanations. Monolithic explanations provide self-contained reasons for an explanandum without considering the agent receiving the explanation, while model reconciling explanations account for the knowledge of the agent receiving the explanation. For monolithic explanations, our approach integrates uncertainty by utilizing probabilistic logic to increase the probability of the explanandum. For model reconciling explanations, we propose a framework that extends the logic-based variant of the model reconciliation problem to account for probabilistic human models, where the goal is to find explanations that increase the probability of the explanandum while minimizing conflicts between the explanation and the probabilistic human model. We introduce explanatory gain and explanatory power as quantitative metrics to assess the quality of these explanations. Further, we present algorithms that exploit the duality between minimal correction sets and minimal unsatisfiable sets to efficiently compute both types of explanations in probabilistic contexts. Extensive experimental evaluations on various benchmarks demonstrate the effectiveness and scalability of our approach in generating explanations under uncertainty.

Downloads

Next from AAAI 2026

Adaptive Coopetition: Leveraging Coarse Verifier Signals for Resilient Multi-Agent LLM Reasoning (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Adaptive Coopetition: Leveraging Coarse Verifier Signals for Resilient Multi-Agent LLM Reasoning (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads