Singapore

Large-language-model (LLM) agents excel at reactive dialogue but struggle with proactive, goal-driven interactions due to myopic decoding and costly planning. We introduce DialogXpert, which leverages a frozen LLM to propose a small, high-quality set of candidate actions per turn and employs a compact Q-network over fixed BERT embeddings trained via temporal-difference learning to select optimal moves within this reduced space. By tracking the user&#39;s emotions DialogXpert tailors each decision to advance the task while nurturing a genuine, empathetic connection. Across negotiation, emotional support, and tutoring benchmarks, DialogXpert drives conversations to under 3 turns with success rates exceeding 94% and, with a larger LLM prior, pushes success above 97% while markedly improving negotiation outcomes. This framework delivers real-time, strategic, and emotionally intelligent dialogue planning at scale.

AAAI 2026

DialogXpert: Driving Intelligent and Emotion-Aware Conversations Through Online Value-Based Reinforcement Learning with LLM Priors

nlp: conversational ai/dialog systems nlp: applications nlp: learning & optimization for nlp nlp: (large) language models

Large-language-model (LLM) agents excel at reactive dialogue but struggle with proactive, goal-driven interactions due to myopic decoding and costly planning. We introduce DialogXpert, which leverages a frozen LLM to propose a small, high-quality set of candidate actions per turn and employs a compact Q-network over fixed BERT embeddings trained via temporal-difference learning to select optimal moves within this reduced space. By tracking the user's emotions DialogXpert tailors each decision to advance the task while nurturing a genuine, empathetic connection. Across negotiation, emotional support, and tutoring benchmarks, DialogXpert drives conversations to under 3 turns with success rates exceeding 94% and, with a larger LLM prior, pushes success above 97% while markedly improving negotiation outcomes. This framework delivers real-time, strategic, and emotionally intelligent dialogue planning at scale.

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

From expert AI systems of the 1970s to self-supervised
systems of the 2020s, the pendulum of AI development has
swung from heavy reliance on human feedback to no or
minimal reliance in the last 50 years. Self-supervised
approaches have contributed significantly to the success
and scalable development of AI. However, today we are at a
tipping point where the future of AI, and whether so-ciety
ends up benefiting from this technology in the long run,
depends critically on the subsequent AI develop-ment
aligning with human goals and values. Realizing this, there
has been ramping up of efforts to align AI models with
human expectations and values. Human feedback, however,
remains limited and difficult to elicit. Thus, a key
question lingers – how can we scale up alignment of AI
systems with individual expectations and societal norms?
This talk and paper provides an overview and perspective on
efforts at answering this question.

Scaling Up AI Alignment

The development of novel effective medical treatments is one of the most important and expected beneficial effects of the AI revolution. This decade is witnessing the rise of AI models able to predict complex properties for protein-protein interactions that hold great promise in assisting in the development of antibody therapeutics and vaccines, including for diseases that long eluded us in the pursuit of an effective treatment. This paper introduces this area of research in a language accessible to an AI researcher, exploring the biological problems that can be solved by AI models, as well as the general context to make solutions feasible in practical scenarios. We survey the main current trends and works in this research area and point towards current still unsolved challenges and trade offs. We expect this paper will be extremely helpful for AI researchers trying to join the field, as well as for researchers already working in one of the subtopics that wish to have a better understanding of the general context around it.

Machine Learning Models Assisting the Development of Antibody Therapeutics and Vaccines – an Emerging Trend

Minimal parametrization of 3D lines plays a critical role in camera localization and structural mapping. 
Existing representations in robotics and computer vision predominantly handle independent lines, 
overlooking structural regularities such as sets of parallel lines that are pervasive in man-made environments. 
This paper introduces \textbf{RiemanLine}, a unified minimal representation for 3D lines formulated on Riemannian manifolds that jointly accommodates both individual lines and parallel-line groups. 
Our key idea is to decouple each line landmark into global and local components: 
a shared vanishing direction optimized on the unit sphere $\mathcal{S}^2$, 
and scaled normal vectors constrained on orthogonal subspaces, enabling compact encoding of structural regularities. 
For $n$ parallel lines, the proposed representation reduces the parameter space from $4n$ (orthonormal form) to $2n+2$, naturally embedding parallelism without explicit constraints. 
We further integrate this parameterization into a factor graph framework, allowing global direction alignment and local reprojection optimization within a unified manifold-based bundle adjustment. 
Extensive experiments on ICL-NUIM, TartanAir, and synthetic benchmarks demonstrate that our method achieves significantly more accurate pose estimation and line reconstruction, while reducing parameter dimensionality and improving convergence stability.

RiemanLine: Riemannian Manifold Representation of 3D Lines for Factor Graph Optimization

Humanoid robots are promising to learn a diverse set of human-like locomotion behaviors, including standing up, walking, running, and jumping. However, existing methods predominantly require training independent policies for each skill, yielding behavior-specific controllers that exhibit limited generalization and brittle performance when deployed on irregular terrains and in diverse situations. To address this challenge, we propose Adaptive Humanoid Control (AHC) that adopts a two-stage framework to learn an adaptive humanoid locomotion controller across different skills and terrains. Specifically, we first train several primary locomotion policies and perform a multi-behavior distillation process to obtain a basic multi-behavior controller, facilitating adaptive behavior switching based on the environment. Then, we perform reinforced fine-tuning by collecting online feedback in performing adaptive behaviors on more diverse terrains, enhancing terrain adaptability for the adaptive behavior controller. We conduct experiments in both simulation and real-world experiments in Unitree G1 robots. The results show that our method exhibits strong adaptability across various situations and terrains.

Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning

Machine learning based predictions are increasingly used in sensitive decision-making applications that directly affect our lives. This has led to extensive research into ensuring the fairness of classifiers. Beyond just fair classification, emerging legislation now mandates that when a classifier delivers a negative decision, it must also offer actionable steps an individual can take to reverse that outcome. This concept is known as _algorithmic recourse_. Nevertheless, many researchers have expressed concerns about the fairness guarantees within the recourse process itself. In this work, we provide a theoretical characterization of unfairness in algorithmic recourse, formally linking fairness guarantees in recourse and classification, and highlighting limitations of the standard equal cost paradigm. We then introduce a novel fairness framework based on _social burden_, along with a practical algorithm **MISOB**, broadly applicable under real-world conditions. Empirical results on real-world datasets show that **MISOB** reduces the social burden across all groups without compromising overall classifier accuracy.

Revisiting (Un)Fairness in Recourse by Minimizing Worst-Case Social Burden

Graph neural networks (GNNs) have demonstrated impressive performance in a broad spectrum of fields, but always suffer from the generalization problem when confronted with out-of-distribution (OOD) scenarios. Information bottleneck (IB) principle, which endeavors to learn the minimally sufficient representations for downstream tasks, has been shown to be a promising strategy in dealing with this problem. However, the IB-based methods do not inherently distinguish between causal and non-causal parts in the graph, leading to underperforming OOD generalization ability. In this paper, we develop the Graph Causal Information Bottleneck (GCIB) framework, a causal extension of the IB for graph data, which is capable of jointly compressing abundant information and capturing causal dependency from the input graph. Specifically, we endow graph IB with the ability of maintaining causal control by incorporating the underlying causal structure and introducing intervention operation. On this basis, we formulate the learning objective for GCIB and present its specific implementation. Graph representations learned by GCIB can effectively preserve causal information that fundamentally determines graph properties, resulting in outstanding OOD generalization ability. Extensive experiments on both synthetic and real-world datasets demonstrate the superiority of GCIB over state-of-the-art baselines.

GCIB: Causal Intervention Guided Graph Information Bottleneck Framework

Drug-drug interaction (DDI) prediction is pivotal for drug safety and clinical decision-making. Recently, subgraph-based methods utilizing knowledge graphs (KGs) and domain information have achieved promising results by extracting informative subgraphs for DDI prediction. However, existing subgraph extraction methods are typically coarse-grained and nonspecific, facing two key limitations: First, they are constrained by the vast and noisy nature of real-world KGs, making it challenging to identify the most informative substructures from the massive space of candidate subgraphs. Second, current methods often fail to exploit the molecular structural specificity of drugs to selectively extract relevant subgraphs, lacking effective integration of molecular structure information with knowledge graph context. To address these challenges, we propose RISE-DDI, a novel framework for Reinforced-based Informative Subgraph Extraction approach for drug-drug interaction prediction. Specifically, RISE-DDI formulates the subgraph extraction as a Markov Decision Process (MDP) and leverages a deep reinforcement learning (RL) agent to dynamically and adaptively extract the most informative and context-specific subgraphs for each drug pair. The agent is guided by a learnable structure-aware reward model that considers both the topological context from the knowledge graph and the molecular features of the drug pairs, thereby encouraging the selection of subgraphs that are both structurally relevant and biologically informative. Extensive experiments on DDI benchmark datasets demonstrate that our method outperforms state-of-the-art baselines in both transductive and inductive scenarios, achieving improvements of up to 20\%. Furthermore, visualization analyses of the extracted subgraphs highlight the interpretability of our model, providing insights into the underlying mechanisms of drug interactions.

Informative Subgraph Extraction with Deep Reinforcement Learning for Drug-Drug Interaction Prediction

Explanation fidelity, which measures how accurately an explanation reflects a model’s true reasoning, remains critically underexplored in recommender systems. We introduce SPINRec (Stochastic Path Integration for Neural Recommender Explanations), a model-agnostic explanation method that adapts path-integration techniques to the sparse and implicit nature of recommendation data. To address the limitations of prior approaches, SPINRecemploys a stochastic baseline sampling strategy. Instead of integrating from a fixed or unrealistic baseline, it samples multiple plausible user profiles from the empirical data distribution and selects the most faithful attribution path. This design accounts for the importance of both observed and unobserved interactions in modern recommenders, resulting in more stable, accurate, and personalized explanations. We conduct the most comprehensive fidelity evaluation to date in this domain. Our experiments span three models (MF, VAE, NCF), three datasets (ML1M, Yahoo! Music, Pinterest), and a suite of counterfactual metrics, including AUC-based perturbation curves and fixed-length diagnostics. SPINRec consistently outperforms strong baselines such as SHAP, LIME, FIA, and LXR across all evaluation settings. These results establish a new benchmark for faithful explainability in recommendation. Code and evaluation tools will be released publicly to support reproducibility and future research.

Fidelity-Aware Recommendation Explanations via Stochastic Path Integration

Clustering is a fundamental task in machine learning and data analysis, but it frequently fails to provide fair representation for various marginalized communities defined by multiple protected attributes -- a shortcoming often caused by biases in the training data. As a result, there is a growing need to enhance the fairness of clustering outcomes, ideally by making minimal modifications, possibly as a post-processing step after conventional clustering. Recently, Chakraborty et al. [COLT'25] initiated the study of \emph{closest fair clustering}, though in a restricted scenario where data points belong to only two groups. In practice, however, data points are typically characterized by many groups, reflecting diverse protected attributes such as age, ethnicity, gender, etc.

In this work, we generalize the study of the \emph{closest fair clustering} problem to settings with an arbitrary number (more than two) of groups. We begin by showing that the problem is NP-hard even when all groups are of equal size -- a stark contrast with the two-group case, for which an exact algorithm exists. Next, we propose near-linear time approximation algorithms that efficiently handle arbitrary-sized multiple groups, thereby answering an open question posed by Chakraborty et al. [COLT'25].

Leveraging our closest fair clustering algorithms, we further achieve improved approximation guarantees for the \emph{fair correlation clustering} problem, advancing the state-of-the-art results established by Ahmadian et al. [AISTATS'20] and Ahmadi et al. [2020]. Additionally, we are the first to provide approximation algorithms for the \emph{fair consensus clustering} problem involving multiple (more than two) groups, thus addressing another open direction highlighted by Chakraborty et al. [COLT'25].

Generalizing Fair Clustering to Multiple Groups: Algorithms and Applications

In large-scale recommendation systems like LinkedIn’s, the retrieval stage is critical for narrowing billions of potential candidates to a manageable subset for ranking. LinkedIn's feed now serves suggested content based on the topical interests of members, where 2000 candidates are retrieved from several million candidates with a latency budget of a few milliseconds and inbound QPS of several thousand per second. This paper presents a novel retrieval approach that fine tunes a large causal language model (Meta’s LLaMA 3) as a dual encoder to generate high quality embeddings for both users (members) and content (items), using only textual input. We describe the end to end pipeline, including prompt design for embedding generation, techniques for fine tuning at LinkedIn scale, and infrastructure for low latency, cost effective online serving. We share our findings on how quantizing numerical features in the prompt enables the information getting encoded in the embedding facilitating greater alignment between the retrieval and ranking layer. The system was evaluated using offline metrics and an online A/B test, which showed substantial improvements in member engagement. We observed significant gains among newer members, who often lack strong network connections, indicating that high-quality suggested content aids retention. This work demonstrates how generative language models can be effectively adapted for real time, high throughput retrieval in industrial applications.

Downloads

Next from AAAI 2026

Scaling Up AI Alignment

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES