Singapore

Recent Large Reasoning Language Models (LRLMs) employ long chain-of-thought reasoning with complex reflection behaviors, typically signaled by specific trigger words (e.g., &quot;Wait&quot; and &quot;Alternatively&quot;) to enhance performance. However, these reflection behaviors can lead to the overthinking problem where the generation of redundant reasoning steps that unnecessarily increase token usage, raise inference costs, and reduce practical utility. In this paper, we propose Certainty-Guided Reflection Suppression (CGRS), a novel method that mitigates overthinking in LRLMs while maintaining reasoning accuracy. CGRS operates by dynamically suppressing the model&#39;s generation of reflection triggers when it exhibits high confidence in its current response, thereby preventing redundant reflection cycles without compromising output quality. Our approach is model-agnostic, requires no retraining or architectural modifications, and can be integrated seamlessly with existing autoregressive generation pipelines.
Extensive experiments across four reasoning benchmarks (i.e., AIME24, AMC23, MATH500, and GPQA-D) demonstrate CGRS&#39;s effectiveness: it reduces token usage by an average of 18.5% to 41.9% while preserving accuracy and also achieves the optimal balance between length reduction and performance compared to state-of-the-art baselines. These results hold consistently across model architectures (e.g., DeepSeek-R1-Distill series, QwQ-32B, and Qwen3 family) and scales (4B to 32B parameters), highlighting CGRS&#39;s practical value for efficient reasoning.

AAAI 2026

Efficient Reasoning for Large Reasoning Language Models via Certainty-Guided Reflection Suppression

nlp: (large) language models

Recent Large Reasoning Language Models (LRLMs) employ long chain-of-thought reasoning with complex reflection behaviors, typically signaled by specific trigger words (e.g., "Wait" and "Alternatively") to enhance performance. However, these reflection behaviors can lead to the overthinking problem where the generation of redundant reasoning steps that unnecessarily increase token usage, raise inference costs, and reduce practical utility. In this paper, we propose Certainty-Guided Reflection Suppression (CGRS), a novel method that mitigates overthinking in LRLMs while maintaining reasoning accuracy. CGRS operates by dynamically suppressing the model's generation of reflection triggers when it exhibits high confidence in its current response, thereby preventing redundant reflection cycles without compromising output quality. Our approach is model-agnostic, requires no retraining or architectural modifications, and can be integrated seamlessly with existing autoregressive generation pipelines.
Extensive experiments across four reasoning benchmarks (i.e., AIME24, AMC23, MATH500, and GPQA-D) demonstrate CGRS's effectiveness: it reduces token usage by an average of 18.5% to 41.9% while preserving accuracy and also achieves the optimal balance between length reduction and performance compared to state-of-the-art baselines. These results hold consistently across model architectures (e.g., DeepSeek-R1-Distill series, QwQ-32B, and Qwen3 family) and scales (4B to 32B parameters), highlighting CGRS's practical value for efficient reasoning.

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Multivariate Time Series Forecasting (MTSF) aims to capture the dependencies among multiple variables and their temporal dynamics to predict future values. In recent years, Large Language Models (LLMs) have set a new paradigm for MTSF, incorporating external knowledge into the modeling process through textual prompts. However, we observe that current LLM-based methods fail to exploit these priors due to their coarse-grained representation of time series data, which hinders effective alignment of the two modals. To address this, we propose M3Time, a multi-modal, multi-scale, and multi-frequency framework for multivariate time series forecasting. It enhances the quality of time series representations and facilitates the integration of LLM semantic priors with fine-grained temporal features. Additionally, M3Time further improved training stability and model robustness with an adaptive mixed loss function, which dynamically balances L1 and L2 error terms. Experiment results on seven real-world public datasets show that M3Time consistently outperforms state-of-the-art methods, underscoring its effectiveness.

M3Time: LLM-Enhanced Multi-Modal, Multi-Scale, and Multi-Frequency Multivariate Time Series Forecasting

Pretrained equivariant graph neural networks based on spherical harmonics offer efficient and accurate alternatives to computationally expensive ab-initio methods, yet adapting them to new tasks and chemical environments still requires fine-tuning. Conventional parameter-efficient fine-tuning (PEFT) techniques, such as Adapters and LoRA, typically break symmetry, making them incompatible with those equivariant architectures. ELoRA, recently proposed, is the first equivariant PEFT method. It achieves improved parameter efficiency and performance on many benchmarks. However, the relatively high degrees of freedom it retains within each tensor order can still perturb pretrained feature distributions and ultimately degrade performance. To address this, we present Magnitude-Modulated Equivariant Adapter (MMEA), a novel equivariant fine-tuning method which employs lightweight scalar gating to modulate feature magnitudes on a per-order and per-multiplicity basis. We demonstrate that MMEA preserves strict equivariance and, across multiple benchmarks, consistently improves energy and force predictions to state-of-the-art levels while training fewer parameters than competing approaches. These results suggest that, in many practical scenarios, modulating channel magnitudes is sufficient to adapt equivariant models to new chemical environments without breaking symmetry, pointing toward a new paradigm for equivariant PEFT design.

Magnitude-Modulated Equivariant Adapter for Parameter-Efficient Fine-Tuning of Equivariant Graph Neural Networks

The task of stochastic human motion prediction has attracted significant attention in recent years due to its wide-ranging applications in robotics, animation, and human-computer interaction. While diffusion models have demonstrated promising progress in this domain, they remain hindered by two critical limitations: (1) slow inference speeds due to their reliance on iterative sampling, and (2) performance degradation resulting from suboptimal sample allocation during generation. To overcome these challenges, we propose SPARD (Single-step Inference with Adaptive Sampling in Residual Diffusion for Human Motion Prediction), a novel framework that achieves efficient single-step inference while maintaining high predictive accuracy. Furthermore, we introduce a novel adaptive noise predictor module that dynamically samples latent representations based on observed motion sequences, ensuring both accuracy and plausibility in generated motions. Extensive experiments on benchmark datasets demonstrate that SPARD significantly outperforms state-of-the-art methods in both inference efficiency and motion quality, achieving a 15× to 18× speedup in sampling time compared to conventional diffusion-based baselines while preserving generation quality.

SPARD: Single-step Inference with Adaptive Sampling in Residual Diffusion for Human Motion Prediction

Although deep learning-based image retouching has made significant progress, its inherent subjectivity renders current black-box methods limited in interactivity and explainability. Among existing efforts, parameter-controlled methods aim to improve interactivity, but often suffer from ambiguous semantics and lack support for natural language control. Reinforcement learning–based explainability methods are constrained by low-dimensional and limited action spaces, which result in suboptimal performance. To address the above issues, we propose RetouchAgent, a novel framework that leverages collaboration among multiple MLLM agents for image retouching. Our method consists of the following key steps: (1) Retrieval: By constructing a multimodal retouching database, we enable an ICL sample retrieval mechanism guided by retouching intent. (2) Engine: Leveraging the vision-language understanding capabilities of MLLM, a carefully designed prompting strategy, and a dedicated operation library, we enable precise and controllable image retouching. (3) Reflection: We evaluate each retouching interaction and optimize the retouching process for progressive result refinement. Finally, through multiple rounds of collaboration among MLLM agents, RetouchAgent achieves state-of-the-art performance in quantitative and qualitative evaluations.

RetouchAgent: Towards Interactive and Explainable Image Retouching with MLLM Agents

In multi-agent systems, explicit cognition of teammates' decision logic serves as a critical factor in facilitating coordination. Communication (i.e., "Tell") can assist in the cognitive development process by information dissemination, yet it is inevitably subject to real-world constraints such as noise, latency, and attacks. Therefore, building the understanding of teammates' decisions without communication remains challenging. To address this, we propose a novel non-communication MARL framework that realizes the construction of cognition through local observation-based modeling (i.e., "Think"). Our framework enables agents to model teammates' active inference process. At first, the proposed method produces three teammate portraits: perception-belief-action. Specifically, we model the teammate's decision process as follows: 1) Perception: observing environments; 2) Belief: forming beliefs; 3) Action: making decisions. Then, we selectively integrate the belief portrait into the decision process based on the accuracy and relevance of the perception portrait. This enables the selection of cooperative teammates and facilitates effective collaboration. Extensive experiments on the SMAC, SMACv2, MPE, and GRF benchmarks demonstrate the superior performance of our method.

Think How Your Teammates Think: Active Inference Can Benefit Decentralized Execution

Dynamic graph learning focuses on representing time-varying graphs, enabling the modeling of evolving relationships between nodes. This approach is essential for applications such as traffic systems, social networks, and recommendation engines, where interactions shift dynamically. While existing methods often utilize temporal modules and transformer networks to capture these changes, a major challenge lies in the high computational demands of self-attention mechanisms, which scale quadratically with the number of nodes.
To address this, we propose a novel transformer-based framework for dynamic graph learning that incorporates a more efficient token mixer. Our key insight is that the Transformer's performance primarily stems from its architecture rather than the self-attention mechanism itself. Thus, we introduce an adaptive token mixer, which aggregates tokens based on their order and timing within a sliding window. Furthermore, we design a hierarchical learning module to capture long-term dependencies by leveraging long-range neighbor information across layers.
Our approach significantly reduces computational complexity while preserving the ability to model both short-term and long-term dependencies in dynamic graphs effectively. Experimental results demonstrate that our framework achieves robust performance, showing that the simplified architectures can deliver competitive results without the resource-intensive requirements of traditional Transformers.

Global-Lens Transformers: Adaptive Token Mixing for Dynamic Link Prediction

The rapid iterations of Large Language Models (LLMs) has intensified the need for scalable, cost efficient routing systems. 
Current frameworks suffer from model lock-in, requiring exhaustive evaluations or retraining to integrate new models, as a critical bottleneck in rapidly evolving LLM ecosystems. We present \systemname, a zero-shot difficulty-aware framework that dynamically routes queries to optimal LLMs using only 100 anchor samples per new model. \systemname introduces three innovations: (1) universal difficulty tiers that runs model-agnostic capability profiling, (2) a context-aware difficulty predictor that maps textual prompts to complexity scores without retroactive testing, and (3) a dual-mode ILP optimizer that balances cost and accuracy under varying constraints. Overall, by decoupling routing logic from model-specific data, our framework enables seamless integration of new LLMs, breaking the scalability limitations of existing systems. Our extensive experimental results demonstrate that \systemname reduces the serving costs of newly onboarded models by 24.50\% without any accuracy loss, and by up to 70.1\% with only minor accuracy reductions.

Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space

We introduce **FinMMDocR**, a novel bilingual multimodal benchmark for evaluating multimodal large language models (MLLMs) on real-world financial numerical reasoning. Compared to existing benchmarks, our work delivers three major advancements. (1) **Scenario Awareness**: 57.9\% of 1,200 expert-annotated problems incorporate 9 types of implicit financial scenarios (*e.g.,* Portfolio Management), challenging models to perform expert-level reasoning based on assumptions; (2) **Document Understanding**: 837 Chinese/English documents spanning 9 types (*e.g.,* Company Research) average 50.8 pages with rich visual elements, significantly surpassing existing benchmarks in both breadth and depth of financial documents; (3) **Multi-Step Computation**: Problems demand 11-step reasoning on average (5.3 extraction + 5.7 calculation steps), with 65.0\% requiring cross-page evidence (2.4 pages average). The best-performing MLLM achieves only 58.0\% accuracy, and different retrieval-augmented generation (RAG) methods show significant performance variations on this task. We expect FinMMDocR to advance the improvement of MLLMs and reasoning-enhanced methods on complex multimodal reasoning tasks in real-world scenarios. Data and code are available in the supplementary material.

FinMMDocR: Benchmarking Financial Multimodal Reasoning with Scenario Awareness, Document Understanding, and Multi-Step Computation

In this paper, we establish non-asymptotic central limit theorems for linear two-timescale stochastic approximation (TTSA) algorithms driven by martingale difference or Markov noise. Focusing on both the last iterate and Polyak–Ruppert averaging regimes, we derive bounds for normal approximation in terms of the convex distance between probability distributions. Our analysis reveals a non-trivial interaction between the fast and slow timescales: the CLT convergence rate for the last iterate improves as the timescale separation increases, while it decreases in the Polyak–Ruppert averaged setting. We also provide the high-order moment bounds for the error of linear TTSA algorithm, which may be of independent interest.

Gaussian Approximation for Two-Timescale Linear Stochastic Approximation

To relieve intensive human-expertise required to design optimization algorithms, recent Meta-Black-Box Optimization (MetaBBO) researches leverage generalization strength of meta-learning to train neural network-based algorithm design policies over a predefined training problem set, which automates the adaptability of the low-level optimizers on unseen problem instances. Currently, a common training problem set choice in existing MetaBBOs is well-known benchmark suites CoCo-BBOB. Although such choice facilitates the MetaBBO's development, problem instances in CoCo-BBOB are more or less limited in diversity, raising the risk of overfitting of MetaBBOs, which might further results in poor generalization. In this paper, we propose an instance generation approach, termed as \textbf{LSRE}, which could generate diverse training problem instances for MetaBBOs to learn more generalizable policies. LSRE first trains an autoencoder which maps high-dimensional problem features into a 2-dimensional latent space. Uniform-grid sampling in this latent space leads to hidden representations of problem instances with sufficient diversity. By leveraging a genetic-programming approach to search function formulas with minimal L2-distance to these hidden representations, LSRE reverse engineers a diversified problem set, termed as \textbf{Diverse-BBO}. We validate the effectiveness of LSRE by training various MetaBBOs on Diverse-BBO and observe their generalization performances on either synthetic or realistic scenarios. Extensive experimental results underscore the superiority of Diverse-BBO to existing training set choices in MetaBBOs. Further ablation studies not only demonstrate the effectiveness of design choices in LSRE, but also reveal interesting insights on instance diversity and MetaBBO's generalization. We provide the code of LSRE and Diverse-BBO at \url{https://github.com/MetaEvo/Diverse-BBO}.

Content not yet available

Next from AAAI 2026

M3Time: LLM-Enhanced Multi-Modal, Multi-Scale, and Multi-Frequency Multivariate Time Series Forecasting

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES