Singapore

The emergence of accurate open large language models (LLMs) has sparked a push for advanced quantization techniques to enable efficient deployment on end-user devices. In this paper, we revisit the challenge of extreme LLM compression---targeting ultra-low-bit quantization for both activations and weights---from a Fourier frequency domain perspective. We propose SpecQuant, a two-stage framework that tackles activation outliers and cross-channel variance. In the first stage, activation outliers are smoothed and transferred into the weight matrix to simplify downstream quantization. In the second stage, we apply channel-wise low-frequency Fourier truncation to suppress high-frequency components while preserving essential signal energy, improving quantization robustness. Our method builds on the principle that most of the weight energy is concentrated in low-frequency components, which can be retained with minimal impact on model accuracy. To enable runtime adaptability, we introduce a lightweight truncation module during inference that adjusts truncation thresholds based on channel characteristics. On LLaMA-3 8B, SpecQuant achieves 4-bit quantization for both weights and activations, narrowing the zero-shot accuracy gap to only 1.5% compared to full precision, while delivering 2× faster inference and 3× lower memory usage.

AAAI 2026

SpecQuant: Spectral Decomposition and Adaptive Truncation for Ultra-Low-Bit LLMs Quantization

llms

frequency domain

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

In formal strategic reasoning for Multi-Agent Systems (MAS), agents are typically assumed to (i) employ arbitrarily complex strategies, (ii) execute each move at zero cost, and (iii) operate over fully crisp game structures. These idealized assumptions stand in stark contrast with human decision-making in real-world environments. The natural strategies framework, along with some of its recent variants, partially addresses this gap by restricting strategies to concise rules guarded by regular expressions. Yet, it still overlook both the cost of each action and the uncertainty that often characterizes human perception of facts over the time. In this work, we introduce HumanATLF, a logic that builds upon natural strategies employing both fuzzy semantics and resource‐bound actions: each action carries a real-valued cost drawn from a non‐refillable budget, and atomic conditions and goals have degrees in [0,1]. We give a formal syntax and semantics, and prove that model checking is in P when both the strategy complexity k and resource budget b are fixed, NP-complete if just one strategic operator over Boolean objectives is allowed, and Delta^P_2‐complete when k and b vary. Moreover, we show that recall‐based strategies can be decided in PSPACE. We implement our algorithms in VITAMIN, an open source model-checking tool for MAS and validate them on an adversarial resource-aware drone rescue scenario.

When Natural Strategies Meet Fuzziness and Resource-Bounded Actions

Large Language Models (LLMs) often exhibit sycophantic behavior, agreeing with user-stated opinions even when those contradict factual knowledge. While prior work has documented this tendency, the internal mechanisms that enable such behavior remain poorly understood. In this paper, we provide a mechanistic account of how sycophancy arises within LLMs. We first systematically study how user opinions induce sycophancy across different model families. We find that simple opinion statements reliably induce sycophancy, whereas user expertise framing has a negligible impact. Through logit-lens analysis and causal activation patching, we identify a two-stage emergence of sycophancy: (1) a late-layer output preference shift and (2) deeper representational divergence. We also verify that user authority fails to influence behavior because models do not encode it internally. In addition, we examine how grammatical perspective affects sycophantic behavior, finding that first-person prompts (“I believe...”) consistently induce higher sycophancy rates than third-person framings (“They believe...”) by creating stronger representational perturbations in deeper layers. These findings highlight that sycophancy is not a surface-level artifact but emerges from a structural override of learned knowledge in deeper layers, with implications for alignment and truthful AI systems.

When Truth Is Overridden: Uncovering the Internal Origins of Sycophancy in Large Language Models

Comprehensively interpreting human behavior is a core challenge in human-aware artificial intelligence. However, prior works typically focused on body behavior, neglecting the crucial role of eye gaze and its synergy with body motion. We present GazeInterpreter -- a novel large language model-based (LLM-based) approach that parses eye gaze data to generate eye-body-coordinated narrations. Specifically, our method features 1) a symbolic gaze parser that translates raw gaze signals into symbolic gaze events; 2) a hierarchical structure that first uses an LLM to generate eye gaze description and then integrates gaze with body motion based on the temporal coherence of historical context to produce comprehensive narration; and 3) a self-correcting loop that iteratively refines the modality match, temporal coherence, and completeness of the integrated narration. We extensively evaluate our method for text-driven motion generation on the large-scale Nymeria benchmark and demonstrate that our method outperforms the state-of-the-art performance. Complementing these evaluations, we further report significant performance improvements for the sample downstream tasks of action anticipation and behavior summarization. Taken together, these results reveal the significant potential of parsing eye gaze to interpret human behavior and open up a new direction for human behavior understanding.

GazeInterpreter: Parsing Eye Gaze to Generate Eye-Body-Coordinated Narrations

Consider a system of multiple physical agents tasked with collaboratively collecting a set of spatially distributed goals as quickly as possible while avoiding collisions with the environment and with each other. This type of problem, which involves Multi-Agent Path Finding (MAPF) and task allocation, is called Multi-Agent Combinatorial Path Finding (MACPF). Prior work on MACPF assumed each agent has a final goal it must reach, there are no orientation constraints on the agents' movements, and the agents will follow their planned actions as intended. These assumptions rarely hold in real physical robots, which limits the applicability of existing MACPF algorithms in practical applications. We propose the Robust CBSS framework, a robust planning approach that solves MACPF without the aforementioned simplifying assumptions, and provide two implementations: a baseline version (RCbssBase) and an efficient version (RCbssEff). RCbssEff generalizes the Conflict-Based Steiner Search (CBSS) algorithm, building on ideas from the p-Robust CBS algorithm and algorithms for solving the Equality Generalized Traveling Salesman Problem. We prove that RCbssEff is complete and can be configured to return optimal solutions. Experimental results on benchmark MACPF problems show that RCbssEff balances planning time, solution cost, and collision reduction compared to baselines.

Robust Multiagent Combinatorial Path Finding

Generating safe and reliable trajectories for autonomous vehicles in long-tail scenarios remains a significant challenge. High-lateral-acceleration maneuvers (e.g., sharp turns) represent a critical subset of these rare but high-risk situations. Existing trajectory planners, often trained on imbalanced datasets, struggle in these scenarios due to insufficient exposure to relevant data, leading to incomplete decision-making information at inference time, particularly concerning the precise interplay of vehicle dynamics, road geometry, and environmental constraints at the limits of handling. Consequently, these planners generate suboptimal or unsafe trajectory predictions in high-lateral-acceleration scenarios. To address this gap, we introduce ReflexDiffusion, a novel inference-stage framework that enhances diffusion-based trajectory planners through reflective adjustment. During iterative denoising, after each standard trajectory update step, we inject an additional adjustment: explicitly amplifying critical conditioning signals (e.g., road curvature, ego lateral dynamics) by computing the gradient between conditional and unconditional noise predictions. This forces the trajectory to strictly adhere to physical constraints, especially improving stability during high-lateral-acceleration maneuvers where precise vehicle-road interaction is paramount. Evaluated on nuPlan Test14-hard benchmark, our approach elevates the driving score for high-lateral-acceleration scenarios by 14.1% over the state-of-the-art. This demonstrates that our method dynamically reinforces safety-critical constraints at handling limits, effectively compensating for training data sparsity through inference-time trajectory optimization. Moreover, our approach is highly generalizable and can be deployed to other diffusion-based planners without modifying the model architecture.

ReflexDiffusion: Reflection-Enhanced Trajectory Planning for High-lateral-acceleration Scenarios in Autonomous Driving

This paper presents a systematic investigation into the constrained generation capabilities of large language models (LLMs) in producing \textit{Songci}, a classical Chinese poetry form characterized by strict structural, tonal, and rhyme constraints defined by Cipai templates. We first develop a comprehensive, multi-faceted evaluation framework that includes: (i) a formal conformity score, (ii) automated quality assessment using LLMs, (iii) human evaluation, and (iv) classification-based probing tasks. Using this framework, we evaluate the generative performance of 18 LLMs, including 3 proprietary models and 15 open-source models across 4 families, under five prompting strategies: zero-shot, one-shot, completion-based, instruction-tuned, and chain-of-thought. Finally, we propose a Generate-Critic architecture in which the evaluation framework functions as an automated critic. Leveraging the critic’s feedback as a reward signal, we fine-tune 3 lightweight open-source LLMs via supervised fine-tuning (SFT), resulting in improvements of up to \textbf{5.88\%} in formal conformity. Our findings offer new insights into the generative strengths and limitations of LLMs in producing culturally significant and formally constrained literary texts.

PoeTone: A Framework for Constrained Generation of Structured Chinese Songci with LLMs

To date, distributional reinforcement learning (distributional RL) methods have exclusively focused on the discounted setting, where an agent aims to optimize a potentially-discounted sum of rewards over time. In this work, we extend distributional RL to the average-reward setting, where an agent aims to optimize the reward received per time-step. In particular, we utilize a quantile-based approach to develop the first set of algorithms that can successfully learn and/or optimize the long-run per-step reward distribution, as well as the differential return distribution of an average-reward MDP. We derive proven-convergent tabular algorithms for both prediction and control, as well as a broader family of algorithms that have appealing scaling properties. Empirically, we find that these algorithms yield competitive and sometimes superior performance when compared to their non-distributional equivalents, while also capturing rich information about the long-run per-step reward and differential return distributions.

A Differential Perspective on Distributional Reinforcement Learning

Human-defined creativity is highly abstract, posing a challenge for multimodal large language models (MLLMs) to comprehend and assess creativity that aligns with human judgments. The absence of an existing benchmark further exacerbates this dilemma. To this end, we propose CreBench, which consists of two key components: 1) an evaluation benchmark covering the multiple dimensions from creative idea to process to products; 2) CreMIT (Creativity Multimodal Instruction Tuning dataset), a multimodal creativity evaluation dataset, consisting of 2.2K diverse-sourced multimodal data, 79.2K human feedbacks and 4.7M multi-typed instructions. Specifically, to ensure MLLMs can handle diverse creativity-related queries, we prompt GPT to refine these human feedbacks to activate stronger creativity assessment capabilities. CreBench serves as a foundation for building MLLMs that understand human-aligned creativity. Based on the CreBench, we fine-tune open-source general MLLMs, resulting in CreExpert, a multimodal creativity evaluation expert model. Extensive experiments demonstrate that the proposed CreExpert models achieve significantly better alignment with human creativity evaluation compared to state-of-the-art MLLMs, including the most advanced GPT-4V and Gemini-Pro-Vision. To our knowledge, we are the first to propose a benchmark for creativity evaluation.

CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product

Trip recommendation aims to generate a sequence of points of interest (POIs) under a user's query input. Existing data-driven methods mainly fall into two categories: supervised approaches and self-supervised approaches. The former cannot fully capture the transition patterns among POIs, while the latter fail to comprehensively model user's query intents.
Fortunately, privileged knowledge distillation (PKD) provides us an unique opportunity to align user's query intents with its corresponding trip in historical data. However, such knowledge alignment is implicit, which may not directly reflect the query intents. To this end, in this paper, we propose EKD-Trip, an explicit intent-enhanced knowledge distillation framework. EKD-Trip first trains a trajectory encoder (teacher model) and a trip generator jointly in a self-supervised manner. Then, a query encoder (student model) is trained via multi-task learning to extract implicit knowledge by PKD from teacher and explicit knowledge from an auxiliary task, respectively. At inference time, we use the query encoder and the trip generator to recommend trips. Extensive experiments on four real-world datasets demonstrate that EKD-Trip outperforms all baselines over three metrics, with a particularly notable improvement of 13.70% in pairs-F1.

Explicit Intent-Enhanced Knowledge Distillation for Trip Recommendation

Numerical reasoning over documents, which demands both contextual understanding and logical inference, is challenging for low-capacity local models deployed on computation-constrained devices. Although such complex reasoning queries could be routed to powerful remote models like GPT-4, exposing local data raises significant data leakage concerns. Existing mitigation methods generate problem descriptions or examples for remote assistance. However, the inherent complexity of numerical reasoning hinders the local model from generating logically equivalent queries and accurately inferring answers with remote guidance. In this paper, we present a model collaboration framework with two key innovations: (1) a context-aware synthesis strategy that shifts the query topics while preserving reasoning patterns; and (2) a tool-based answer reconstruction approach that reuses the remote-generated plug-and-play solution with code snippets. Experimental results demonstrate that our method achieves better reasoning accuracy than solely using local models while providing stronger data protection than fully relying on remote models. Furthermore, our method improves accuracy by 16.2\% - 43.6\% while reducing data leakage by 2.3\% - 44.6\% compared to existing data protection approaches.

Content not yet available

Next from AAAI 2026

When Natural Strategies Meet Fuzziness and Resource-Bounded Actions

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES