Singapore

Dynamic retrieval-augmented generation (RAG) enables large language models (LLMs) to fetch external knowledge on demand, improving adaptability over static RAG. A key challenge in this setting is determining when retrieval should occur. Prior methods typically trigger retrieval based on low confidence in individual tokens, which can result in delayed intervention after errors have already occurred. We propose the Entropy-Trend Constraint (ETC), a training-free method that selects optimal retrieval timing by modeling the dynamics of token-level uncertainty. Specifically, ETC leverages first- and second-order differences of the entropy sequence to capture emerging uncertainty trends, enabling earlier and more precise retrieval. Experiments across six QA benchmarks and three LLM backbones show that ETC consistently outperforms strong baselines while reducing retrieval frequency. It is especially effective in domain-specific settings, demonstrating robust generalization. Further ablation studies and qualitative analysis confirm that trend-aware uncertainty modeling leads to more effective retrieval timing. Our approach is plug-and-play, model-agnostic, and easy to integrate into existing decoding pipelines. Code is provided in the supplementary materials.

AAAI 2026

Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG

question answering;(large) language models;other

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Accurate short-term precipitation forecasting is critical for weather-sensitive decision-making in agriculture, transportation, and disaster response. Existing deep learning approaches often struggle to balance global structural consistency with local detail preservation, especially under complex meteorological conditions. We propose DuoCast, a dual-diffusion framework that decomposes precipitation forecasting into low- and high-frequency components modeled in orthogonal latent subspaces. We theoretically prove that this frequency decomposition reduces prediction error compared to conventional single branch U-Net diffusion models. In DuoCast, the low-frequency model captures large-scale trends via convolutional encoders conditioned on weather front dynamics, while the high-frequency model refines fine-scale variability using a self-attention-based architecture. Experiments on four benchmark radar datasets show that DuoCast consistently outperforms state-of-the-art baselines, achieving superior accuracy in both spatial detail and temporal evolution.

DuoCast: Duo-Probabilistic Diffusion for Precipitation Nowcasting

We present a polynomial-time algorithm for exactly computing second-price pacing equilibria (SPPE) in multi-buyer auction markets with a constant number of buyers. In addition, our algorithm can efficiently find a pacing equilibrium which optimizes key metrics such as revenue or social welfare. SPPEs are central to modern advertising auctions, yet computing or even approximating them is PPAD-hard in general. To overcome this challenge in the restricted setting, we apply the cell-decomposition method. Specifically, we partition the solution space into polynomially many cells defined by hyperplanes, each corresponding to a fixed ordering of buyers' scaled valuations across goods. Within each cell, finding an equilibrium can be reduced to solving a linear program. To the best of our knowledge, our work identifies the first class of second-price pacing games for which an exact SPPE can be computed efficiently.

Pacing Equilibria in Second-Price Auctions with Few Buyers

3D teeth segmentation, involving the localization of tooth instances and their semantic categorization in 3D dental models, is a critical yet challenging task in digital dentistry due to the complexity of real-world dentition. In this paper, we propose 3DTeethSAM, an adaptation of the Segment Anything Model 2 (SAM2) for 3D teeth segmentation. SAM2 is a pretrained foundation model for image and video segmentation, demonstrating a strong backbone in various downstream scenarios. To adapt SAM2 for 3D teeth data, we render images of 3D teeth models from predefined views, apply SAM2 for 2D segmentation, and reconstruct 3D results using 2D-3D projections. Since SAM2's performance depends on input prompts and its initial outputs often have deficiencies, and given its class-agnostic nature, we introduce three light-weight learnable modules: (1) a prompt embedding generator to derive prompt embeddings from image embeddings for accurate mask decoding, (2) a mask refiner to enhance SAM2's initial segmentation results, and (3) a mask classifier to categorize the generated masks. Additionally, we incorporate Deformable Global Attention Plugins (DGAP) into SAM2's image encoder. The DGAP enhances both the segmentation accuracy and the speed of the training process. Our method has been validated on the 3DTeethSeg benchmark, achieving an IoU of 91.90\% on high-resolution 3D teeth meshes, establishing a new state-of-the-art in the field.

3DTeethSAM: Taming SAM2 for 3D Teeth Segmentation

Multimodal in-context learning (ICL) is emerging as a key capability that enables large vision-language models (LVLMs) to adapt to novel tasks without parameter updates, expanding their utility across various real-world applications. However, ICL remains unstable, even with well-matched in-context demonstrations (ICDs), suggesting that LVLMs struggle to fully utilize the provided context. While existing efforts focus on prompt engineering or post‑hoc logit calibration, we instead investigate the underlying attention dynamics to overcome LVLMs' inherent limitations. We identify two critical deficits in their self-attention that impair effective ICL. To bridge the gap, we propose \textbf{Context-Aware Modulated Attention} (CAMA), a plug-and-play and training-free method that dynamically modulates LVLM's attention logits based on the input in‑context sequence. CAMA employs a two-stage attention modulation to address both identified deficits, enhancing the focus on semantically significant tokens, particularly visual ones. Across four LVLMs and seven benchmarks, CAMA consistently outperforms vanilla models and baselines, demonstrating great effectiveness and generalization. It can also activate the desired effects of prompt engineering methods and remains robust under diverse sequence configurations. Thus, CAMA paves the way for deeper explorations of attention dynamics to advance multimodal reasoning.

Make LVLMs Focus: Context-Aware Attention Modulation for Better Multimodal In-Context Learning

We clarify the complexity of answering unions of conjunctive queries over knowledge bases formulated in the description logic $\mathcal{S}$, the extension of $\mathcal{ALC}$ with transitive roles. Contrary to what existing partial results suggested, we show that the problem is, in fact, 2ExpTime-complete; hardness already holds in the presence of two transitive roles and for Boolean conjunctive queries. We complement this result by showing that the problem remains in coNExpTime when the input query is rooted or is restricted to use at most one transitive role (but may use arbitrarily many non-transitive roles).

Revisiting Conjunctive Query Entailment for S

Despite the rapid progress of Vision Language Models (VLMs), existing benchmarks still concentrate on coarse-grained object recognition or simple relational reasoning, leaving the fine-grained and higher-order reasoning abilities of these systems largely unexamined. 
To bridge this critical evaluation gap, we introduce EmojiGrid, a novel diagnostic benchmark specifically designed to probe these fine-grained and higher-order skills. 
Leveraging the universal and semantically rich nature of emojis, we synthesize a grid‑based visual dataset paired with 29,000+ QA pairs.
Each pair is explicitly anchored in a three-level cognitive taxonomy comprising (i) Perception and Information Extraction, (ii) Relational and Structural Reasoning, and (iii) Abstraction and Advanced Cognition.
These dimensions further decompose into nine categories covering a broad range of cognitive skills, including counting, spatial relations, compositional logic, semantic sentiment, and related higher-order reasoning tasks.
Our extensive evaluation of 25 state-of-the-art open-source and proprietary VLMs reveals a significant performance gap between foundational perceptual tasks and higher-level cognitive abilities, particularly in abstraction and advanced emotional reasoning.
Notably, all models struggle with compositional logic, spatial consistency, and especially emotional and semantic understanding. 
EmojiGrid provides a quantifiable, fine-grained benchmark to diagnose VLM limitations and guides future progress toward models that can truly perceive, reason about, and interpret complex, symbol-rich visual scenes. The benchmark will be publicly released.

Beyond Counting: Evaluating Abstract and Emotional Reasoning in Vision-Language Models

Real-world multimodal misinformation often arises from mixed forgery sources, requiring dynamic reasoning and adaptive verification. However, existing methods mainly rely on static pipelines and limited tool usage, limiting their ability to handle such complexity and diversity. To address this challenge, we propose T2Agent, a novel misinformation detection agent that incorporates an extensible toolkit with Monte Carlo Tree Search (MCTS). The toolkit consists of modular tools such as web search, forgery detection, and consistency analysis. Each tool is described using standardized templates, enabling seamless integration and future expansion. To avoid inefficiency from using all tools simultaneously, a greedy search-based selector is proposed to identify a task-relevant subset. This subset then serves as the action space for MCTS to dynamically collect evidence and perform multi-source verification. To better align MCTS with the multi-source nature of misinformation detection, T2Agent extends traditional MCTS with multi-source verification, which decomposes the task into coordinated subtasks targeting different forgery sources. A dual reward mechanism containing a reasoning trajectory score and a confidence score is further proposed to encourage a balance between exploration across mixed forgery sources and exploitation for more reliable evidence. We conduct ablation studies to confirm the effectiveness of the tree search mechanism and tool usage. Extensive experiments further show that T2Agent consistently outperforms existing baselines on challenging mixed-source multimodal misinformation benchmarks, demonstrating its strong potential as a training-free approach for enhancing detection accuracy. The code will be released.

T2Agent: A Tool-augmented Multimodal Misinformation Detection Agent with Monte Carlo Tree Search

Heterogeneous Graph Neural Networks (HGNNs) have demonstrated remarkable capabilities in capturing effective information in heterogeneous graphs, achieving outstanding performance in various learning tasks. However, the heavy dependency of HGNNs on neighbors information may result in high latency, which restricts their practicality in real-world applications. Recent studies have attempted to overcome such latency in Graph Neural Networks by distilling knowledge into student models that do not rely on graph structure. But these approaches primarily focus on replicating teachers' predictive outcomes while neglecting the structural knowledge they encoded. This limitation makes such approach less effective when graphs become complex, particularly on heterogeneous graphs. Motivated by this challenge, we propose HGKD, a novel hierarchical knowledge distillation framework that transfers both structural knowledge and predictive outcomes from HGNN teachers to a multi-layer perceptron student. Additionally, we provide two variants of HGKD that help the student learn from multiple teacher models through Pareto learning and incorporate low-cost neighbor information. We evaluate HGKD and its variants on a range of heterogeneous graph datasets. The results demonstrate that our student model achieves performance comparable to or exceeding that of HGNN teachers, despite not relying on graph structures during inference.

Pareto-Based Heterogeneous Knowledge Distillation for MLPs on Graphs

Understanding and reasoning over complex spreadsheets remain fundamental challenges for large language models (LLMs), which often struggle with intricate structures and rely solely on neural computation. In this work, we propose SheetBrain, a neuro-symbolic dual-workflow agent framework for precise and interpretable reasoning over tabular data. SheetBrain consists of an understanding module that produces a comprehensive overview of the spreadsheet, including structural summaries and query-specific analyses to guide execution; an execution module that integrates a Python sandbox with preloaded table-processing libraries and an Excel helper toolkit for effective data manipulation; and a validation module that verifies the correctness of reasoning and answers, triggering re-execution if necessary. We evaluate SheetBrain on multiple public QA and manipulation benchmarks, and introduce SheetBench, a new benchmark targeting large, multi-table, and structurally complex spreadsheets. Experimental results show that SheetBrain significantly improves reasoning performance on both existing benchmarks and the more challenging scenarios presented in SheetBench.

SheetBrain: A Neuro-Symbolic Agent for Accurate Reasoning over Complex and Large Spreadsheets

Control of off-road vehicles is challenging due to the
complex dynamic interactions with the terrain. Accurate
modeling of these interactions is important to optimize
driving performance, but the relevant physical phenomena,
such as slip, are too complex to model from first
principles. Therefore, we present an offline meta-learning
algorithm to construct a rapidly-tunable model of residual
dynamics and disturbances. Our model processes terrain
images into features using a visual foundation model (VFM),
then maps these features and the vehicle state to an
estimate of the current actuation matrix using a deep
neural network (DNN). We then combine this model with
composite adaptive control to modify the last layer of the
DNN in real time, accounting for the remaining terrain
interactions not captured during offline training. We
provide mathematical guarantees of stability and robustness
for our controller, and demonstrate the effectiveness of
our method through simulations and hardware experiments
with a tracked vehicle and a car-like robot. We evaluate
our method outdoors on different slopes with varying
slippage and actuator degradation disturbances, and compare
against an adaptive controller that does not use the VFM
terrain features. We show significant improvement over the
baseline in both hardware experimentation and simulation.

Downloads

Next from AAAI 2026

DuoCast: Duo-Probabilistic Diffusion for Precipitation Nowcasting

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES