Singapore

Traditionally, AI research in medical diagnosis has largely centered on image analysis. While this has led to notable advancements, the absence of patient-reported symptoms continues to hinder diagnostic accuracy. To address this, we propose a Pre-Consultation Dialogue framework that mimics real-world diagnostic procedures, where doctors iteratively query patients before reaching a conclusion. Specifically, we simulate diagnostic dialogues between two vision–language models (VLMs): a DoctorVLM, which generates follow-up questions based on the image and dialogue history, and a PatientVLM, which responds using a symptom profile derived from the ground-truth diagnosis. We additionally conducted a small-scale clinical validation of the synthetic symptoms generated by our framework, confirming their usefulness for diagnosis. These DocVLM–PatientVLM interactions yield realistic, multi-turn dialogues paired with images and diagnoses, which are then used to fine-tune the DoctorVLM. This dialogue-based training substantially enhances diagnostic performance. For instance, using Qwen2.5-VL-7B as the base model, with symptoms generated using our framework achieves an F1 score of 81.0%, compared to just 56.5% with direct image-only fine-tuning on the DermaMNIST dataset.

AAAI 2026

PatientVLM Meets DocVLM: Pre-Consultation Dialogue Between Vision-Language Models for Efficient Diagnosis

vision-language models.

dialogue-aware diagnosis

communicating vlms

pre-consultation dialogue

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Long-term, high-fidelity simulation of slow-changing physical systems, such as the ocean and climate, presents a fundamental challenge in scientific computing. Traditional autoregressive machine learning models often fail in these tasks as minor errors accumulate and lead to rapid forecast degradation. To address this problem, we propose NeuralOM, a general neural operator framework designed for simulating complex, slow-changing dynamics. NeuralOM's core consists of two key innovations: (1) a Progressive Residual Correction Framework that decomposes the forecasting task into a series of fine-grained refinement steps, effectively suppressing long-term error accumulation; and (2) a Physics-Guided Graph Network whose built-in adaptive messaging mechanism explicitly models multi-scale physical interactions, such as gradient-driven flows and multiplicative couplings, thereby enhancing physical consistency while maintaining computational efficiency. We validate NeuralOM on the challenging task of global Subseasonal-to-Seasonal (S2S) ocean simulation. Extensive experiments demonstrate that NeuralOM not only surpasses state-of-the-art models in forecast accuracy and long-term stability, but also excels in simulating extreme events. For instance, at a 60-day lead time, NeuralOM achieves a 13.3% lower RMSE compared to the best-performing baseline, offering a stable, efficient, and physically-aware paradigm for data-driven scientific computing.

NeuralOM: Neural Ocean Model for Subseasonal-to-Seasonal Simulation

Despite recent advances in LLMs, the task of code generation is still challenging. To cope, code selection algorithms select the best program from multiple programs generated by an LLM. However, existing algorithms can fail to identify the correct program, either because they can misidentify nonequivalent programs or because they rely on an LLM and assume it always correctly determines the output for every input. We present ExPairT-LLM, an exact learning algorithm for code selection that selects a program by posing to an LLM oracle two new types of queries: pairwise membership and pairwise equivalence. These queries are simpler for LLMs and enable ExPairT-LLM to identify the correct program through a tournament, which is robust to some LLM mistakes. We evaluate ExPairT-LLM on four popular code datasets. Its pass@1 (success rate) outperforms the state-of-the-art code selection algorithm on average by +13.0% and up to +27.1%. It also improves the pass@1 of LLMs performing complex reasoning by +24.0%.

ExPairT-LLM: Exact Learning for LLM Code Selection by Pairwise Queries

Sequential recommendation aims to predict the next item based on historical interactions. To further enhance the reasoning capability in sequential recommendation, LLMs are employed to predict the next item or generate semantic IDs for item representation, given LLMs' extensive domain knowledge and reasoning ability. However, existing LLM-based methods suffer from two limitations. (i) The scarcity of recommendation data with reasoning paths makes it challenging to design suitable chain-of-thought prompting templates, and the full potential of LLMs' reasoning abilities remains underutilized. (ii) Upon obtaining semantic IDs, the LLMs and their representations are excluded from the subsequent recommendation model training, preventing downstream models from fully utilizing the rich semantic information encoded within these IDs. To address these issues, we propose a novel CoderRec framework, which is capable of fully exploiting the information encoded in semantic IDs to guide the recommendation process. Specifically, to address the problem of scarcity in reasoning path-augmented data, we introduce latent reasoning into sequential recommendation and treat the representation captured by the downstream model as domain-specific latent thought, enabling implicit logical inference without requiring explicit CoT annotations. To ensure that the downstream recommendation models are able to deeply leverage the semantic information within IDs, we propose a novel cross-scale model collaboration strategy, which employs cross-scale IDs and a two-phase approach to align LLM-derived semantics with recommendation objectives. Extensive experiments have shown the effectiveness of our proposed CoderRec framework.

Cross-Scale Collaboration between LLMs and Lightweight Sequential Recommenders with Domain-Specific Latent Reasoning

Causal discovery from observational data is a fundamental tool in various fields of science.
While existing approaches are typically designed for a single dataset, we often need to handle multiple datasets with non-identical variable sets in practice.
One straightforward approach is to estimate a causal graph from each dataset and construct a single causal graph by overlapping.
However, this approach identifies limited causal relationships because unobserved variables in each dataset can be confounders, and some variable pairs may be unobserved in any dataset.
To address this issue, we leverage Causal Additive Models with Unobserved Variables (CAM-UV) that provide causal graphs having information related to unobserved variables.
We show that the ground truth causal graph has structural consistency with the information of CAM-UV on each dataset.
As a result, we propose an approach named I-CAM-UV to integrate CAM-UV results by enumerating all consistent causal graphs.
We also provide an efficient combinatorial search algorithm and demonstrate the usefulness of I-CAM-UV against existing methods.

I-CAM-UV: Integrating Causal Graphs over Non-Identical Variable Sets Using Causal Additive Models with Unobserved Variables

While code large language models have demonstrated remarkable progress in code generation, the generated code often exhibits poor runtime efficiency, limiting its practical application in performance-sensitive scenarios. To address this limitation, we propose an efficiency-oriented reinforcement learning framework guided by a novel performance reward. Based on this framework, we take a deeper dive into the code efficiency problem, identifying then proposing methods to overcome key bottlenecks: (1) Dynamic exploration overcomes the static data constraints of offline fine-tuning, enabling the discovery of more efficient code implementations. (2) The error-insensitive reinforcement learning method and high-contrast efficiency signals are crucial for mitigating systematic errors and achieving effective optimization. (3) Online exploration is most effective when starting from a high-correctness baseline, as this allows for efficiency improvements without sacrificing accuracy. With these discoveries, we finally propose a two-stage tuning method, which achieves high and balanced performance across correctness and efficiency. The results of experiments show the effectiveness of the method, which improves code correctness by 10.18\% and runtime efficiency by 7.75\% on a 7B model, achieving performance comparable to much larger model.

Towards Better Correctness and Efficiency in Code Generation

We study active mitigation of selection bias in statistical learning. That is sequential maximization over a set $\mathcal{A}$ of the expectation of a reward function $R(a,X)$ w.r.t. a r.v. $X$ drawn from a target distribution $P_T$ possibly different from the (supposedly dominating) source distribution $P_S$ under which rewards are observed. The importance function $dP_T/dP_S(x)$ with which the sequentially observed biased rewards should be ideally weighted being unknown in practice, auxiliary information is assumed to be available in the form of known moments of the target distribution $P_T$ for debiasing purposes. In the batch setting, this problem has already been studied and can be solved under certain conditions in two successive steps: 1) identify a weight function so as to approximate the moments 2) maximize the resulting (empirical version of the) weighted reward. In the active setting, if the problem boils down to identifying the best arm in a stochastic multi-armed bandit (MAB) model, the presence of selection bias strongly affects the complexity of the sequential optimization problem and requires the development of a new algorithmic approach, as we show here. In a fixed confidence setting, we introduce a novel notion of complexity, which accounts for the balance between arm evaluation and (parametric) weight function estimation, establish lower bounds and propose an algorithm proved to be near optimal. Theoretical guarantees are backed up by numerical results.

Best Arm Identification with Biased Contexts

Large language models (LLMs) show significant improvement in code generation. A common practice is sampling multiple candidate codes to increase the likelihood of producing an accurate solution. However, effectively identifying the best candidate from the pool is a significant challenge. Although existing code consensus methods attempt to solve this issue, they suffer from a critical problem: relying on test cases generated by LLMs, which can be flawed or provide incomplete coverage. This problem can result in erroneous validations, causing correct code to fail flawed tests and preventing the detection of functional differences in candidate code solutions. To address these issues, we present the Dynamic-Static Synergistic Selection Method, a novel framework that combines two complementary analytical approaches. First, it uses the abstract syntax tree (AST) to detect and filter candidate solutions and test cases. Second, the method statically analyzes the solutions' quality, then dynamically validates functional consistency based on execution results of extracted inputs, which neutralizes the impact of faulty tests. Extensive experiments demonstrate that this synergistic approach significantly outperforms existing methods, substantially enhancing the correctness of the selected code.

Dynamic-Static Synergistic Selection Method for Candidate Code Solutions with Generated Test Cases

Synthetic aperture radar (SAR) image acquisition incurs high costs, motivating SAR image generation research under limited data. However, SAR's inherent azimuth sensitivity complicates this task, where target scattering characteristics vary significantly with azimuth angle, leading to azimuth-target coupling and azimuth overfitting under data scarcity. Most existing methods require supplementary data to work effectively, limiting their practicality. In this paper, we propose SAR-DisentDM, a novel semantic-disentangled diffusion model for limited-data SAR image generation, without requiring any auxiliary resources. We develop a physics-aware diffusion architecture that explicitly models semantic knowledge of SAR images, including intrinsic characteristics, contextual diversity, and measurement randomness. A key innovation is the attention-guided semantic disentanglement module (AGSD), designed to decouple category-specific features from azimuth-variable scattering patterns using dual disentangled losses with time-step-adaptive optimization. To further avoid azimuth overfitting, we introduce azimuth angle perturbation augmentation mechanism (AAPA) to enhance azimuth angle diversity. Extensive evaluations validate that SAR-DisentDM enables controllable SAR image synthesis with designated attributes, significantly improving representation and generalization abilities under limited data. Synthetic imagery from our approach boosts automatic target recognition (ATR) accuracy beyond state-of-the-art methods.

SAR-DisentDM: A Semantic-Disentangled Diffusion Model for Limited-Data SAR Image Synthesis

Monocular 3D object detection is a cost-effective solution for applications like autonomous driving and robotics, but remains fundamentally ill-posed due to inherently ambiguous depth cues. Recent DETR-based methods attempt to mitigate this through global attention and auxiliary depth prediction, yet they still struggle with inaccurate depth estimates. Moreover, these methods often overlook instance-level detection difficulty, such as occlusion, distance, and truncation, leading to suboptimal detection performance. We propose MonoDLGD, a novel Difficulty-Aware Label-Guided Denoising framework that adaptively perturbs and reconstructs ground-truth labels based on detection uncertainty. Specifically, MonoDLGD applies stronger perturbations to easier instances and weaker ones into harder cases, and then reconstructs them to effectively provide explicit geometric supervision. By jointly optimizing label reconstruction and 3D object detection, MonoDLGD encourages geometry-aware representation learning and improves robustness to varying levels of object complexity. Extensive experiments on the KITTI benchmark demonstrate that MonoDLGD achieves state-of-the-art performance across all difficulty levels.

Difficulty-Aware Label-Guided Denoising for Monocular 3D Object Detection

Autonomous aerial robots must operate in cluttered, wind-
disturbed environments where turbulence and gusts generated
by wind-object and terrain interactions introduce significant aerodynamic risks, including orientation instability, sensor degradation, control drift, and increased power consumption, often leading to mission failure or crash. We present Graphlets-based Zero-Shot Planning Framework (GZS), a novel, non-parametric, fast computation, memory-efficient,
zero-shot training-free onboard inference framework for real-time 3D spatial-aware aerodynamic risk perception that operates without prior scene knowledge. GZS dynamically classifies point clouds to extract local topology, incorporates physics-informed modeling of wind interactions, and applies attention-guided segment matching to generate onboard 3D representations of wind-induced aerodynamic risk. It transforms unstructured scene segments into structured graphlets topologies encoding aerodynamic risk-aware features, enabling UAVs to identify and navigate through regions of minimal aerodynamic hazard in real time and without prior training in any environment. Unlike computational fluid dynamics(CFD)-based, deep learning, or map-dependent approaches, GZS performs zero-shot aerodynamic risk estimation in previously unseen and dynamic conditions. Extensive experiments demonstrate 90-95% accurate aerodynamic risk zone identification compared to conventional methods of CFDs and wind tunnels, while substantially reducing computational and memory overhead, and a 100% success rate in creating onboard 3d spatial-aware risk perceptions. Our results establish GZS as a framework for a zero-shot, non-parametric, robust, aerodynamic risk perception for autonomous real-time trajectory planning in wind-affected aerial environments.

Downloads

Next from AAAI 2026

NeuralOM: Neural Ocean Model for Subseasonal-to-Seasonal Simulation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

NeuralOM: Neural Ocean Model for Subseasonal-to-Seasonal Simulation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads