Singapore

Probabilistic decoding in Large Language Models (LLMs) often yields inconsistent outputs, particularly on complex or long-form questions. Self-Consistency (SC) mitigates this for short-form QA by majority voting over exact strings, whereas Universal Self-Consistency (USC) and Weighted Unigram Consistency Score (WUCS) extend to long-form responses but lose accuracy on short-form benchmarks. 

We introduce $\textbf{Latent Self-Consistency (LSC)}$, which selects the most semantically consistent response using learnable token embeddings. A lightweight forward generation of summary tokens increases inference time by less than 1% and requires no changes to the model architecture.

Across 6 short-form and 5 long-form reasoning benchmarks (e.g., MATH, MMLU, TruthfulQA), LSC surpasses SC, USC and WUCS on all short-form and long-form ones on average, while maintaining negligible computational overhead. These results position LSC as a practical consistency-selection method that works reliably across answer formats.
Additionally, LSC provides well-calibrated confidence estimates, maintaining low Expected Calibration Error across both answer formats.

AAAI 2026

Latent Self-Consistency for Reliable Majority-Set Selection in Short- and Long-Answer Reasoning

etc.

nlp: sentence-level semantics

nlp: question answering

nlp: (large) language models

krr: common-sense reasoning

nlp: generation

krr: knowledge representation languages

ml: representation learning

textual inference

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Large language models (LLMs) have significantly transformed natural language understanding and generation, but they raise privacy concerns due to potential exposure of sensitive information. Studies have highlighted the risk of information leakage, where adversaries can extract sensitive information embedded in the prompts. In this work, we introduce a novel private prediction framework for generating high-quality synthetic text with strong privacy guarantees. Our approach leverages the Differential Privacy (DP) framework to ensure worst-case theoretical bounds on information leakage without requiring any fine-tuning of the underlying models.The proposed method performs inference on private records and aggregates the resulting per-token output distributions. This enables the generation of longer and coherent synthetic text while maintaining privacy guarantees. Additionally, we propose a simple blending operation that combines private and public inference to further enhance utility. Empirical evaluations demonstrate that our approach outperforms previous state-of-the-art methods on in-context-learning (ICL) tasks, making it a promising direction for privacy-preserving text generation while maintaining high utility.

Privacy Preserving In-Context-Learning Framework for Large Language Models

Accurate 3D vehicle pose and shape reconstruction from monocular images remains a formidable challenge for autonomous driving, particularly for distant, occluded, or small objects. Existing methods often suffer from geometric ambiguity in depth estimation and structural hollowness in shape recovery, primarily due to inadequate multi-scale feature aggregation and inflexible prior modeling. To overcome these limitations, a novel framework termed MonoVPR is proposed by integrating dynamic context adaptation and progressive geometry refinement. Specifically, a Hierarchical Dual-Context Attention (HDCA) module is introduced to resolve scale-dependent degradation through gated cross-attention across multi-resolution feature maps, dynamically fusing object-centric geometric cues with scene-centric semantics. For shape refinement, the Bounded Iterative Mesh Refiner (BIMR) is developed, where template-guided deformations are progressively optimized via multi-head deformable attention and a tanh-bounded correction loop, ensuring physically plausible reconstructions. Extensive experiments on the ApolloCar3D benchmark demonstrate MonoVPR achieves state-of-the-art performance, showcasing exceptional capability in reconstructing geometrically consistent shapes and precise poses for challenging long-range and occluded scenarios.

Monocular Vehicle Pose and Shape Reconstruction via Dynamic Context Adaptation and Progressive Geometry Refinement

Automatic presentation slide generation can greatly streamline content creation. However, since preferences of each user may vary, existing under-specified formulations often lead to suboptimal results that fail to align with individual user needs. We introduce a novel task that conditions slide generation on user-specified preferences. We propose a human-behavior-inspired agentic framework, SlideTailor, that progressively generates editable slides in a user-aligned manner. Instead of requiring users to write their preferences in detailed textual form, our system only asks for a paper–slides example pair and a visual template—natural and easy-to-provide artifacts that implicitly encode rich user preferences across content and visual style. Despite the implicit and unlabeled nature of these inputs, our framework effectively distills and generalizes the preferences to guide customized slide generation. We also introduce a novel chain-of-speech mechanism to align slide content with planned oral narration. Such a design significantly enhances the quality of generated slides and enables downstream applications like video presentations. To support this new task, we construct a benchmark dataset that captures diverse user preferences, with meticulously designed interpretable metrics for robust evaluation. Experiments demonstrate the effectiveness of our proposed approach. Code and data will be released upon paper publication.

SlideTailor: Personalized Presentation Slide Generation for Scientific Papers

A major challenge in developing robust and generalizable Human Activity Recognition (HAR) systems for smart homes is the lack of large and diverse labeled datasets. Variations in home layouts, sensor configurations, and individual behaviors further exacerbate this issue. To address this, we leverage the idea of embodied AI agents—virtual agents that perceive and act within simulated environments guided by internal world models. We introduce AgentSense, a virtual data generation pipeline in which agents live out daily routines in simulated smart homes, with behavior guided by Large Language Models (LLMs). The LLM generates diverse synthetic personas and realistic routines grounded in the environment, which are then decomposed into fine-grained actions. These actions are executed in an extended version of the VirtualHome simulator, which we augment with virtual ambient sensors that record the agents’ activities. Our approach produces rich, privacy-preserving sensor data that reflects real-world diversity. We evaluate AgentSense on five real HAR datasets. Models pretrained on the generated data consistently outperform baselines, especially in low-resource settings. Furthermore, combining the generated virtual sensor data with a small amount of real data achieves performance comparable to training on full real-world datasets. These results highlight the potential of using LLM-guided embodied agents for scalable and cost-effective sensor data generation in HAR.

AgentSense: Virtual Sensor Data Generation Using LLM Agents in Simulated Home Environments

Multimodal Large Language Models (MLLMs) achieve impressive performance once optimized on massive datasets. Such datasets often contain sensitive or copyrighted content, raising significant data privacy concerns. Regulatory frameworks mandating the 'right to be forgotten' drive the need for machine unlearning. This technique allows for the removal of target data without resource-consuming retraining. However, while well-studied for text, visual concept unlearning in MLLMs remains underexplored. A primary challenge is precisely removing a target visual concept without disrupting model performance on related entities. To address this, we introduce AUVIC, a novel visual concept unlearning framework for MLLMs. AUVIC applies adversarial perturbations to enable precise forgetting. This approach effectively isolates the target concept while avoiding unintended effects on similar entities. To evaluate our method, we construct VCUBench. It is the first benchmark designed to assess visual concept unlearning in group contexts. Experimental results demonstrate that AUVIC achieves state-of-the-art target forgetting rates while incurs minimal performance degradation on non-target concepts.

AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models

Pedestrian trajectory prediction is critical for ensuring safety in autonomous driving, surveillance systems, and urban planning applications. While early approaches primarily focus on one-hop pairwise relationships, recent studies attempt to capture high-order interactions by stacking multiple Graph Neural Network (GNN) layers. However, these approaches face a fundamental trade-off: insufficient layers may lead to under-reaching problems that limit the model's receptive field, while excessive depth can result in prohibitive computational costs. We argue that an effective model should be capable of adaptively modeling both explicit one-hop interactions and implicit high-order dependencies, rather than relying solely on architectural depth. To this end, we propose ViTE (Virtual graph Trajectory Expert router), a novel framework for pedestrian trajectory prediction. ViTE consists of two key modules: a Virtual Graph that introduces dynamic virtual nodes to model long-range and high-order interactions without deep GNN stacks, and an Expert Router that adaptively selects interaction experts based on social context using a Mixture-of-Experts design. This combination enables flexible and scalable reasoning across varying interaction patterns. Experiments on three benchmarks (ETH/UCY, NBA, and SDD) demonstrate that our method consistently achieves state-of-the-art performance, validating both its effectiveness and practical efficiency.

ViTE: Virtual Graph Trajectory Expert Router for Pedestrian Trajectory Prediction

Many high-level multi-agent planning problems, such as multi-robot navigation and path planning, can be modeled with deterministic actions and observations. In this work, we focus on such domains and introduce the class of Deterministic Decentralized POMDPs (Det-Dec-POMDPs)—a subclass of Dec-POMDPs with deterministic transitions and observations given the state and joint actions. We then propose a practical solver, Iterative Deterministic POMDP Planning (IDPP), based on the classic Joint Equilibrium Search for Policies framework, specifically optimized to handle large-scale Det-Dec-POMDPs that existing Dec-POMDP solvers cannot handle efficiently.

Scalable Solution Methods for Dec-POMDPs with Deterministic Dynamics

Given the task of landing a ball in a goal region beyond direct reach, humans can often throw, slide, or rebound objects against the wall to attain the goal. Enabling robots to replicate such reasoning is non-trivial as it requires multi-step planning and involves a mixture of discrete and continuous action spaces, a sparse and sensitive reward structure, computationally expensive simulations, and an incomplete understanding of the environment's physics. We present PhyPlan, a physics-informed and adaptable planning framework for efficient multi-step physical reasoning. At its core, PhyPlan comprises of Generative Flow Networks (GFlowNets) and Monte Carlo Tree Search (MCTS) to explore and evaluate sequences of object interactions. GFlowNets sample discrete action sequences in proportion to their associated reward, enabling broad and reward-driven exploration of the discrete planning space. MCTS complements this by adaptively balancing the use of a fast but approximate pre-trained physics-informed dynamics predictor and costly but accurate environment rollouts, ensuring both speed and precision in planning. The known and actual physics discrepancy is captured using Gaussian Process Regression. Experiments on benchmark simulated tasks requiring composition of collisions, slides, and rebounds demonstrate that PhyPlan achieves a 45\% higher success rate and up to 3× efficiency gains over state-of-the-art model-based reinforcement learning approaches.

PhyPlan: Learning to Plan Tasks with Generalizable and Rapid Physical Reasoning for Embodied Manipulation

Large language models (LLMs) present new opportunities for creating pedagogical agents that engage in meaningful dialogue to support student learning. However, the current use of LLM systems like ChatGPT in classrooms often lacks the solid theoretical foundation found in earlier intelligent tutoring systems. To bridge this gap, we propose a framework that combines Evidence-Centered Design with Social Cognitive Theory for adaptive scaffolding in LLM-based agents focused on STEM+C learning. We illustrate this framework with Inquizzitor, an LLM-based formative assessment agent that integrates human-AI hybrid intelligence and provides feedback grounded in cognitive science principles. Our findings show that Inquizzitor delivers high-quality assessment and interaction aligned with core learning theories, offering teachers effective guidance that students value. This research underscores the potential for theory-driven LLM integration in education, highlighting the ability of these systems to provide adaptive and principled instruction.

A Theory of Adaptive Scaffolding for LLM-Based Pedagogical Agents

Embodied agents, such as robots, will need to interact in situated environments and reason over social norms to achieve naturalistic communication with humans. Reference resolution is a critical aspect of this interaction that involves handling normative expectations that come with situated interaction. However, there are no normative reasoning benchmarks for reference resolution, and it remains an understudied capability. To address this gap, we introduce SNIC (_Situated Norms in Context_): a human-validated benchmark for norm-based reference resolution (NBRR) focused on physically grounded norms that arise in everyday social contexts. We investigate state-of-the-art Large Language Models (LLMs) for normative reasoning to test how these models can extract and utilize normative principles relevant to reference resolution. We find that LLMs generally struggle to consistently extract and reason with social norms and norm conflicts for reference resolution across contexts when social norms are not explicitly given. Overall, this research highlights a blind spot for state-of-the-art LLMs and informs work on reasoning with text-based physically situated norms.

Downloads

Next from AAAI 2026

Privacy Preserving In-Context-Learning Framework for Large Language Models

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES