Singapore

Large Language Models (LLMs) often exhibit sycophantic behavior, agreeing with user-stated opinions even when those contradict factual knowledge. While prior work has documented this tendency, the internal mechanisms that enable such behavior remain poorly understood. In this paper, we provide a mechanistic account of how sycophancy arises within LLMs. We first systematically study how user opinions induce sycophancy across different model families. We find that simple opinion statements reliably induce sycophancy, whereas user expertise framing has a negligible impact. Through logit-lens analysis and causal activation patching, we identify a two-stage emergence of sycophancy: (1) a late-layer output preference shift and (2) deeper representational divergence. We also verify that user authority fails to influence behavior because models do not encode it internally. In addition, we examine how grammatical perspective affects sycophantic behavior, finding that first-person prompts (“I believe...”) consistently induce higher sycophancy rates than third-person framings (“They believe...”) by creating stronger representational perturbations in deeper layers. These findings highlight that sycophancy is not a surface-level artifact but emerges from a structural override of learned knowledge in deeper layers, with implications for alignment and truthful AI systems.

AAAI 2026

When Truth Is Overridden: Uncovering the Internal Origins of Sycophancy in Large Language Models

explainable ml

interpretability

language models

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Comprehensively interpreting human behavior is a core challenge in human-aware artificial intelligence. However, prior works typically focused on body behavior, neglecting the crucial role of eye gaze and its synergy with body motion. We present GazeInterpreter -- a novel large language model-based (LLM-based) approach that parses eye gaze data to generate eye-body-coordinated narrations. Specifically, our method features 1) a symbolic gaze parser that translates raw gaze signals into symbolic gaze events; 2) a hierarchical structure that first uses an LLM to generate eye gaze description and then integrates gaze with body motion based on the temporal coherence of historical context to produce comprehensive narration; and 3) a self-correcting loop that iteratively refines the modality match, temporal coherence, and completeness of the integrated narration. We extensively evaluate our method for text-driven motion generation on the large-scale Nymeria benchmark and demonstrate that our method outperforms the state-of-the-art performance. Complementing these evaluations, we further report significant performance improvements for the sample downstream tasks of action anticipation and behavior summarization. Taken together, these results reveal the significant potential of parsing eye gaze to interpret human behavior and open up a new direction for human behavior understanding.

GazeInterpreter: Parsing Eye Gaze to Generate Eye-Body-Coordinated Narrations

Consider a system of multiple physical agents tasked with collaboratively collecting a set of spatially distributed goals as quickly as possible while avoiding collisions with the environment and with each other. This type of problem, which involves Multi-Agent Path Finding (MAPF) and task allocation, is called Multi-Agent Combinatorial Path Finding (MACPF). Prior work on MACPF assumed each agent has a final goal it must reach, there are no orientation constraints on the agents' movements, and the agents will follow their planned actions as intended. These assumptions rarely hold in real physical robots, which limits the applicability of existing MACPF algorithms in practical applications. We propose the Robust CBSS framework, a robust planning approach that solves MACPF without the aforementioned simplifying assumptions, and provide two implementations: a baseline version (RCbssBase) and an efficient version (RCbssEff). RCbssEff generalizes the Conflict-Based Steiner Search (CBSS) algorithm, building on ideas from the p-Robust CBS algorithm and algorithms for solving the Equality Generalized Traveling Salesman Problem. We prove that RCbssEff is complete and can be configured to return optimal solutions. Experimental results on benchmark MACPF problems show that RCbssEff balances planning time, solution cost, and collision reduction compared to baselines.

Robust Multiagent Combinatorial Path Finding

Generating safe and reliable trajectories for autonomous vehicles in long-tail scenarios remains a significant challenge. High-lateral-acceleration maneuvers (e.g., sharp turns) represent a critical subset of these rare but high-risk situations. Existing trajectory planners, often trained on imbalanced datasets, struggle in these scenarios due to insufficient exposure to relevant data, leading to incomplete decision-making information at inference time, particularly concerning the precise interplay of vehicle dynamics, road geometry, and environmental constraints at the limits of handling. Consequently, these planners generate suboptimal or unsafe trajectory predictions in high-lateral-acceleration scenarios. To address this gap, we introduce ReflexDiffusion, a novel inference-stage framework that enhances diffusion-based trajectory planners through reflective adjustment. During iterative denoising, after each standard trajectory update step, we inject an additional adjustment: explicitly amplifying critical conditioning signals (e.g., road curvature, ego lateral dynamics) by computing the gradient between conditional and unconditional noise predictions. This forces the trajectory to strictly adhere to physical constraints, especially improving stability during high-lateral-acceleration maneuvers where precise vehicle-road interaction is paramount. Evaluated on nuPlan Test14-hard benchmark, our approach elevates the driving score for high-lateral-acceleration scenarios by 14.1% over the state-of-the-art. This demonstrates that our method dynamically reinforces safety-critical constraints at handling limits, effectively compensating for training data sparsity through inference-time trajectory optimization. Moreover, our approach is highly generalizable and can be deployed to other diffusion-based planners without modifying the model architecture.

ReflexDiffusion: Reflection-Enhanced Trajectory Planning for High-lateral-acceleration Scenarios in Autonomous Driving

This paper presents a systematic investigation into the constrained generation capabilities of large language models (LLMs) in producing \textit{Songci}, a classical Chinese poetry form characterized by strict structural, tonal, and rhyme constraints defined by Cipai templates. We first develop a comprehensive, multi-faceted evaluation framework that includes: (i) a formal conformity score, (ii) automated quality assessment using LLMs, (iii) human evaluation, and (iv) classification-based probing tasks. Using this framework, we evaluate the generative performance of 18 LLMs, including 3 proprietary models and 15 open-source models across 4 families, under five prompting strategies: zero-shot, one-shot, completion-based, instruction-tuned, and chain-of-thought. Finally, we propose a Generate-Critic architecture in which the evaluation framework functions as an automated critic. Leveraging the critic’s feedback as a reward signal, we fine-tune 3 lightweight open-source LLMs via supervised fine-tuning (SFT), resulting in improvements of up to \textbf{5.88\%} in formal conformity. Our findings offer new insights into the generative strengths and limitations of LLMs in producing culturally significant and formally constrained literary texts.

PoeTone: A Framework for Constrained Generation of Structured Chinese Songci with LLMs

To date, distributional reinforcement learning (distributional RL) methods have exclusively focused on the discounted setting, where an agent aims to optimize a potentially-discounted sum of rewards over time. In this work, we extend distributional RL to the average-reward setting, where an agent aims to optimize the reward received per time-step. In particular, we utilize a quantile-based approach to develop the first set of algorithms that can successfully learn and/or optimize the long-run per-step reward distribution, as well as the differential return distribution of an average-reward MDP. We derive proven-convergent tabular algorithms for both prediction and control, as well as a broader family of algorithms that have appealing scaling properties. Empirically, we find that these algorithms yield competitive and sometimes superior performance when compared to their non-distributional equivalents, while also capturing rich information about the long-run per-step reward and differential return distributions.

A Differential Perspective on Distributional Reinforcement Learning

Human-defined creativity is highly abstract, posing a challenge for multimodal large language models (MLLMs) to comprehend and assess creativity that aligns with human judgments. The absence of an existing benchmark further exacerbates this dilemma. To this end, we propose CreBench, which consists of two key components: 1) an evaluation benchmark covering the multiple dimensions from creative idea to process to products; 2) CreMIT (Creativity Multimodal Instruction Tuning dataset), a multimodal creativity evaluation dataset, consisting of 2.2K diverse-sourced multimodal data, 79.2K human feedbacks and 4.7M multi-typed instructions. Specifically, to ensure MLLMs can handle diverse creativity-related queries, we prompt GPT to refine these human feedbacks to activate stronger creativity assessment capabilities. CreBench serves as a foundation for building MLLMs that understand human-aligned creativity. Based on the CreBench, we fine-tune open-source general MLLMs, resulting in CreExpert, a multimodal creativity evaluation expert model. Extensive experiments demonstrate that the proposed CreExpert models achieve significantly better alignment with human creativity evaluation compared to state-of-the-art MLLMs, including the most advanced GPT-4V and Gemini-Pro-Vision. To our knowledge, we are the first to propose a benchmark for creativity evaluation.

CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product

Trip recommendation aims to generate a sequence of points of interest (POIs) under a user's query input. Existing data-driven methods mainly fall into two categories: supervised approaches and self-supervised approaches. The former cannot fully capture the transition patterns among POIs, while the latter fail to comprehensively model user's query intents.
Fortunately, privileged knowledge distillation (PKD) provides us an unique opportunity to align user's query intents with its corresponding trip in historical data. However, such knowledge alignment is implicit, which may not directly reflect the query intents. To this end, in this paper, we propose EKD-Trip, an explicit intent-enhanced knowledge distillation framework. EKD-Trip first trains a trajectory encoder (teacher model) and a trip generator jointly in a self-supervised manner. Then, a query encoder (student model) is trained via multi-task learning to extract implicit knowledge by PKD from teacher and explicit knowledge from an auxiliary task, respectively. At inference time, we use the query encoder and the trip generator to recommend trips. Extensive experiments on four real-world datasets demonstrate that EKD-Trip outperforms all baselines over three metrics, with a particularly notable improvement of 13.70% in pairs-F1.

Explicit Intent-Enhanced Knowledge Distillation for Trip Recommendation

Numerical reasoning over documents, which demands both contextual understanding and logical inference, is challenging for low-capacity local models deployed on computation-constrained devices. Although such complex reasoning queries could be routed to powerful remote models like GPT-4, exposing local data raises significant data leakage concerns. Existing mitigation methods generate problem descriptions or examples for remote assistance. However, the inherent complexity of numerical reasoning hinders the local model from generating logically equivalent queries and accurately inferring answers with remote guidance. In this paper, we present a model collaboration framework with two key innovations: (1) a context-aware synthesis strategy that shifts the query topics while preserving reasoning patterns; and (2) a tool-based answer reconstruction approach that reuses the remote-generated plug-and-play solution with code snippets. Experimental results demonstrate that our method achieves better reasoning accuracy than solely using local models while providing stronger data protection than fully relying on remote models. Furthermore, our method improves accuracy by 16.2\% - 43.6\% while reducing data leakage by 2.3\% - 44.6\% compared to existing data protection approaches.

Collaborative LLM Numerical Reasoning with Local Data Protection

Personalized image generation is crucial for improving the user experience, as it renders reference images into preferred ones according to user visual preferences. Although effective, existing methods face two main issues. First, existing methods treat all items in the user's historical sequence equally when extracting user preferences, overlooking the varying semantic similarities between historical items and the reference item. Disproportionately high weights for low-similarity items distort user visual preferences for the reference item. Second, existing methods heavily rely on consistency between generated and reference images to optimize generation, which leads to underfitting user preferences and hinders personalization. To address these issues, we propose Retrieval Augmented Personalized Image GenerAtion guided by Recommendation (RAGAR). Our approach uses a retrieval mechanism to assign different weights to historical items according to their similarities to the reference item, thereby extracting more refined users' visual preferences for the reference item. Then we introduce a novel rank task based on the multi-modal ranking model to optimize the personalization of the generated images instead of forcing depend on consistency. Extensive experiments and human evaluations on three real-world datasets demonstrate that RAGAR achieves significant improvements in both personalization and semantic metrics compared to five baselines.

RAGAR: Retrieval Augmented Personalized Image Generation Guided by Recommendation

Traditional text-to-motion frameworks often lack precise control, and existing approaches based on joint keyframe locations provide only positional guidance, making it challenging and unintuitive to specify body part orientations and motion timing. To address these limitations, we introduce the Salient Orientation Symbolic (SOS) script, a programmable symbolic framework for specifying body part orientations and motion timing at keyframes.
We further propose an automatic SOS extraction pipeline that employs temporally-constrained agglomerative clustering for frame saliency detection and a Saliency-based Masking Scheme (SMS) to generate sparse, interpretable SOS scripts directly from motion data. Moreover, we present the SOSControl framework, which treats the available orientation symbols in the sparse SOS script as salient and prioritizes satisfying these constraints during motion generation. By incorporating SMS-based data augmentation and gradient-based iterative optimization, the framework enhances alignment with user-specified constraints. Additionally, it employs a ControlNet-based ACTOR-PAE Decoder to ensure smooth and natural motion outputs.
Extensive experiments demonstrate that the SOS extraction pipeline generates human-interpretable scripts with symbolic annotations at salient keyframes, while the SOSControl framework outperforms existing baselines in motion quality, controllability, and generalizability with respect to motion timing and body part orientation control.

Downloads

Next from AAAI 2026

GazeInterpreter: Parsing Eye Gaze to Generate Eye-Body-Coordinated Narrations

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

GazeInterpreter: Parsing Eye Gaze to Generate Eye-Body-Coordinated Narrations

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads