Singapore

Multi-Hop Question Answering (MHQA) requires step-by-step reasoning across multiple pieces of information to answer complex questions. The cache-aided Retrieval-Augmented Generation (RAG) can accelerate the process of external knowledge retrieval at each reasoning step for MHQA. However, existing methods focus on the internal structure and ignore the misalignment between the queries’ arrival order and cache hit order. To tackle this, we propose Mnemosyne, a cache hit order fitting method designed to accelerate the RAG progress for MHQA. Specifically, our cache-aware order fitting strategy adjusts the order of queries arrival via graph reordering to better align with the cache hit order, thereby reducing the likelihood of failed or unproductive retrieval attempts. The multi-granularity caching storage mechanism is designed to loosen the strict hit condition to multiple similar semantic matching modes, facilitating that relevant documents can still be retrieved. Experiments conducted on four multi-hop QA datasets demonstrate that Mnemosyne effectively reduces retrieval latency while enhancing task answer F1 score, achieving a superior trade-off between efficiency and effectiveness.

AAAI 2026

Mnemosyne: Accelerating Multi-Hop Question Answering via Cache Hit Order Fitting

multi-hop question answering retrieval-augmented generation cache hit order fitting

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Location-Based Social Network (LBSN) check-in trajectory data are important for many practical applications like POI recommendation, advertising, and pandemic intervention. However, the high collection costs and ever-increasing privacy concerns prevent us from accessing large-scale LBSN trajectory data. The recent advances in synthetic data generation provide us with a new opportunity to achieve this, which utilizes generative AI to generate synthetic data that preserves the characteristics of real data while ensuring privacy protection.
However, generating synthetic LBSN check-in trajectories remains challenging due to their spatially discrete, temporally irregular nature and the complex spatio-temporal patterns caused by sparse activities and uncertain human mobility.
To address this challenge, we propose GeoGen, a two-stage coarse-to-fine framework for large-scale LBSN check-in trajectory generation.
In the first stage, we reconstruct spatially continuous, temporally regular latent movement sequences from the original LBSN check-in trajectories and then design a Sparsity-aware Spatio-temporal Diffusion model (S$^2$TDiff) with an efficient denosing network to learn their underlying behavioral patterns.
In the second stage, we design Coarse2FineNet, a Transformer-based Seq2Seq architecture equipped with a dynamic context fusion mechanism in the encoder and a multi-task hybrid-head decoder, which generates fine-grained LBSN trajectories based on coarse-grained latent movement sequences by modeling semantic relevance and behavioral uncertainty.
Extensive experiments on four real-world datasets show that GeoGen excels state-of-the-art models for both fidelity and utility evaluation, e.g., it increases over 69\% and 55\% in distance and radius metrics on the FS-TKY dataset.

GeoGen: A Two-stage Coarse-to-Fine Framework for Fine-grained Synthetic Location-based Social Network Trajectory Generation

Compositional reasoning is a critical capability for multimodal models, enabling systematic understanding of complex scenes through structured combinations of objects, attributes, and relations. However, existing research on this ability primarily focuses on vision-language models (VLMs, e.g., CLIP and SigLIP), with limited exploration of multimodal large language models (MLLMs). To address this gap, we introduce CR³, a novel framework that enhances compositional reasoning abilities of MLLMs via rule-based reinforcement learning. CR³ leverages rule-based rewards to optimize the MLLM's policy on systematically curated multimodal instruction-following tasks, guided by a model-adaptive dynamic task mixing strategy. Our approach boosts performance by over 19% on three compositional reasoning benchmarks, significantly outperforming supervised fine-tuning (SFT) by at least 12%. Crucially, CR³ demonstrates superior generalization by improving performance on out-of-domain benchmarks where SFT methods degrade, highlighting its effectiveness and data efficiency.

CR³: Boosting Compositional Reasoning in MLLMs Through Rule-Based Reinforcement Learning

Recent advances in Large Language Models (LLMs) - particularly model scaling and test-time techniques - have greatly enhanced the reasoning capabilities of language models at the expense of higher inference costs. To lower inference costs, prior works train router models or deferral mechanisms that allocate easy queries to a small, efficient model, while forwarding harder queries to larger, more expensive models. However, these trained router models often lack robustness under domain shifts and require expensive data synthesis techniques such as Monte Carlo rollouts to obtain sufficient ground-truth routing labels for training. 
In this work, we propose Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning (STEER), a domain-agnostic, framework that performs fine-grained, step-level routing between smaller and larger LLMs without utilizing external models. STEER leverages confidence scores from the smaller model’s logits prior to generating a reasoning step, so that the large model is invoked only when necessary. 
Extensive evaluations using different LLMs on a diverse set of challenging benchmarks across multiple domains such as Mathematical Reasoning, Multi-Hop QA, and Planning tasks indicate that STEER achieves competitive or enhanced accuracy while reducing inference costs (up to $+20\%$ accuracy with $48\%$ less FLOPs compared to solely using the larger model on AIME), outperforming baselines that rely on trained external modules.
Our results establish model-internal confidence as a robust, domain-agnostic signal for model routing, offering a scalable pathway for efficient LLM deployment.

Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning

Multimodal Large Language Models (MLLMs) largely lag human-level performance on abstract visual reasoning (AVR), which requires models to infer latent rules from visual question sets and generalize them to novel scenarios. Most AVR benchmarks are constrained to narrow and repetitive 2D patterns, involving relatively simple spatial relationships and assessing limited dimensions of reasoning ability. Drawing inspiration from real-world paper folding challenges, we propose Paper Folding Puzzles (PFP), a rigorously designed benchmark specifically developed to assess spatial reasoning capabilities. It comprises 150K visual question-answering samples across five diverse tasks, ranging from basic 2D geometric reasoning to 3D spatial understanding. The developed benchmark dataset can be employed to assess core spatial reasoning abilities essential to human cognition, encompassing fundamental symmetry reasoning and 3D spatial comprehension. Furthermore, we conduct a comprehensive evaluation of 18 leading MLLMs (both closed- and open-source variants) on the PFP benchmark to assess their spatial reasoning capabilities. Our findings show that most MLLMs achieve near-chance performance on FPF, exhibiting substantial performance gaps (>30%) relative to human baselines across all tasks. This highlights a critical research gap in improving spatial reasoning capabilities of MLLMs. The dataset and code will be released upon paper acceptance.

Paper Folding Puzzles: Can Multimodal Large Language Models Perform Spatial Reasoning?

Recent advances in Large Language Models (LLMs) have enhanced text-based recommendation by enriching traditional ID-based methods with semantic generalization capabilities. Text-based methods typically encode item textual information via prompt design and generate discrete semantic IDs through item tokenization. However, in domain-specific tasks such as local-life services, simply injecting location information into prompts fails to capture fine-grained spatial characteristics and real-world distance awareness among items. To address this, we propose LGSID, an LLM-Aligned Geographic Item Tokenization Framework for Local-life Recommendation. This framework consists of two key components: (1) RL-based Geographic LLM Alignment, and (2) Hierarchical Geographic Item Tokenization. In the RL-based alignment module, we initially train a list-wise reward model to capture real-world spatial relationships among items. We then introduce a novel G-DPO algorithm that uses pre-trained reward model to inject generalized spatial knowledge and collaborative signals into LLMs while preserving their semantic understanding. Furthermore, we propose a hierarchical geographic item tokenization strategy, where primary tokens are derived from discrete spatial and content attributes, and residual tokens are refined using the aligned LLM’s geographic representation vectors. Extensive experiments on real-world Kuaishou industry datasets show that LGSID consistently outperforms state-of-the-art discriminative and generative recommendation models. Ablation studies, visualizations, and case studies further validate its effectiveness.

LLM-Aligned Geographic Item Tokenization for Local-Life Recommendation

The rapid and reliable assembly of defect-free atom arrays is a fundamental challenge for scaling neutral atom quantum computing. While parallel rearrangement methods using spatial light modulators (SLMs) show promise, they suffer from significant computational overhead in two key sub-tasks: atom-site matching and hologram generation. In this paper, we propose a novel framework to address these bottlenecks and enhance the efficiency and fidelity of the assembly process. Our approach features a new optimization objective for atom-site matching that minimizes the longest movement path, and a Fourier U-Net model that integrates a Fourier Neural Operator (FNO) to enable real-time hologram generation. The model is trained in a fully unsupervised paradigm, leveraging the physical properties of holography to eliminate the need for costly ground-truth labels. Experimental results demonstrate our framework not only significantly outperforms state-of-the-art supervised CNNs but also achieves an inference speed orders of magnitude faster than traditional iterative algorithms, enabling real-time, dynamic atom rearrangement.

Towards Real-Time Neutral Atom Array Assembly via Unsupervised Hologram Generation and Path Optimization

In the agricultural domain, the deployment of large language models (LLMs) is hindered by the lack of training data and evaluation benchmarks. To mitigate this issue, we propose AgriEval, the first comprehensive Chinese agricultural benchmark with three main characteristics: (1) \textit{Comprehensive Capability Evaluation.} AgriEval covers six major agriculture categories and 29 subcategories within agriculture, addressing four core cognitive scenarios—memorization, understanding, inference, and generation. (2) \textit{High-Quality Data.} The dataset is curated from university-level examinations and assignments, providing a natural and robust benchmark for assessing the capacity of LLMs to apply knowledge and make expert-like decisions. (3) \textit{Diverse Formats and Extensive Scale.} AgriEval comprises 14,697 multiple-choice questions and 2,167 open-ended question-and-answer questions, establishing it as the most extensive agricultural benchmark available to date. We also present comprehensive experimental results over 51 open-source and commercial LLMs. The experimental results reveal that most existing LLMs struggle to achieve 60\% accuracy, underscoring the developmental potential in agricultural LLMs. Additionally, we conduct extensive experiments to investigate factors influencing model performance and propose strategies for enhancement.

AgriEval: A Comprehensive Chinese Agricultural Benchmark for Large Language Models

We revisit the problem of generating synthetic data under differential privacy.
To address the core limitations of marginal-based methods, we propose the Private Adaptive Generative Adversarial Network with Bayes Network Structure ($\texttt{PrAda-GAN}$), which integrates the strengths of both GAN-based and marginal-based approaches.
Our method adopts a sequential generator architecture to capture complex dependencies among variables, while adaptively regularizing the learned structure to promote sparsity in the underlying Bayes network.
Theoretically, we establish diminishing bounds on the parameter distance, variable selection error, and Wasserstein distance.
Our analysis shows that leveraging dependency sparsity leads to significant improvements in convergence rates.
Empirically, experiments on both synthetic and real-world datasets demonstrate that $\texttt{PrAda-GAN}$ outperforms existing tabular data synthesis methods in terms of the privacy–utility trade-off.

PrAda-GAN: A Private Adaptive Generative Adversarial Network with Bayes Network Structure

Deep learning models often achieve high accuracy but lack interpretability, making them unsuitable for critical applications such as medical diagnosis, biomolecule design, criminal justice, etc. The Sparse High-Order Interaction Model (SHIM) addresses this limitation by providing both transparency and predictive reliability. However, real-world data often contain outliers, which can distort model performance. To overcome this, we propose Huberized-SHIM, an extension of SHIM that integrates Huber loss-based robust regression to mitigate the impact of outliers. We introduce a homotopy-based exact regularization path algorithm and a novel tree-pruning criterion to efficiently manage interaction complexity. Additionally, we incorporate the conformal prediction framework to enhance statistical reliability. Empirical evaluations on synthetic and real-world datasets demonstrate the superior robustness and accuracy of Huberized-SHIM in high-stakes decision-making contexts.

Statistically Robust Sparse High-order Interaction Model

Drag-Based Image Editing (DBIE), which allows users to manipulate images by directly dragging objects within them, has recently attracted much attention from the community. However, it faces two key challenges: (\emph{i}) point-based drag is often highly ambiguous and difficult to align with user intentions; (\emph{ii}) current DBIE methods primarily rely on alternating between motion supervision and point tracking, which is not only cumbersome but also fails to produce high-quality results. These limitations motivate us to explore DBIE from a new perspective---unifying it as a Latent Region Optimization (LRO) problem that aims to use region-level geometric transformations to optimize latent code to realize drag manipulation. Thus, by specifying the areas and types of geometric transformations, we can effectively address the ambiguity issue. We also propose a simple yet effective editing framework, dubbed \textbf{DragNeXt}. It solves LRO through Progressive Backward Self-Intervention (PBSI), simplifying the overall procedure of the alternating workflow while further enhancing quality by fully leveraging region-level structure information and progressive guidance from intermediate drag states. We validate \textbf{DragNeXt} on our NextBench, and extensive experiments demonstrate that our proposed method can significantly outperform existing approaches. Code will be released on~github.

Downloads

Next from AAAI 2026

GeoGen: A Two-stage Coarse-to-Fine Framework for Fine-grained Synthetic Location-based Social Network Trajectory Generation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

GeoGen: A Two-stage Coarse-to-Fine Framework for Fine-grained Synthetic Location-based Social Network Trajectory Generation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads