Singapore

Automated interpretation and reporting of chest X-rays (CXRs) hold significant promise in reducing diagnostic errors and supporting radiologists under heavy clinical workloads. However, existing methods typically rely on global visual features and token-level supervision, limiting their sensitivity to subtle abnormalities and reducing their clinical reliability. 
To address these challenges, we present Reflective X-ray Network (RefleXNet), which systematically integrates multi-scale visual feature fusion and anatomical relational reasoning with a targeted self-reflective learning strategy. 
RefleXNet first constructs multi-scale visual representations and captures anatomical context through graph-based relational modeling. 
Building upon these representations, we introduce a targeted self-reflection strategy that uses clinically guided feedback from generated reports to selectively refine abnormality predictions and their associated region-level visual features. 
Extensive experiments on MIMIC-CXR demonstrate that RefleXNet consistently outperforms state-of-the-art baselines across clinical factual correctness metrics. Notably, our compact 3B-parameter model surpasses several recent models with over twice the parameter count. Additionally, RefleXNet exhibits strong generalization performance in zero-shot evaluations on IU-Xray compared with leading multimodal language models, highlighting its robustness and clinical effectiveness.

AAAI 2026

RefleXNet: Targeted Self-Reflection for Accurate Chest X-ray Reporting

chest x-ray report generation

self-reflection

multimodal learning

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Despite recent progress in adapting State Space Models such as Mamba to vision tasks, their intrinsic 1D scanning mechanism imposes limitations when applied to inherently 2D-structured data like images. Existing adaptations, including VMamba and 2DMamba, either suffer from inconsistency between scanning order and spatial locality or restrict inter-patch communication to singular paths, hindering effective information propagation. In this paper, we propose 2D-CrossScan, a novel 2D-compatible scan framework that enables spatially consistent, multi-path hidden state propagation by integrating modified state equations over two-dimensional neighborhoods. Furthermore, we mitigate redundant information accumulation due to overlapping paths via cross-directional subtraction. To fully align with the 2D spatial structure, we introduce a multi-directional scanning strategy that starts simultaneously from all four corners of the image, enabling diverse propagation paths and better feature integration. Our approach maintains efficiency, requiring only minimal architectural changes to existing Mamba variants. Experimental results demonstrate substantial improvements in multiple visual tasks, including object detection and semantic segmentation on PANDA and COCO datasets. Compared to baseline SSM-based methods, 2D-CrossScan consistently yields better spatial representations, as confirmed by extensive effective receptive field visualizations and attention analyses. These results highlight the importance of geometry-aware state propagation and validate 2D-CrossScan as a simple yet powerful extension to SSMs for vision.

2D-CrossScan Mamba: Enhancing State Space Models with Spatially Consistent Multi-Path 2D Information Propagation

The US Centers for Disease Control and Prevention (CDC), in 2019, designated Methicillin-resistant Staphylococcus aureus (MRSA) as a serious antimicrobial resistance threat. The risk of acquiring MRSA and suffering life-threatening consequences due to it remains especially high for hospitalized patients due to a unique combination of factors, including: co-morbid conditions, immuno suppression, antibiotic use, and risk of contact with contaminated hospital workers and equipment. In this paper, we present a novel generative probabilistic model, GenHAI, for modeling sequences of MRSA test results outcomes for patients during a single hospitalization. This model can be used to answer many important questions from the perspectives of hospital administrators for mitigating the risk of MRSA infections. Our model is based on the probabilistic programming paradigm, and can be used to approximately answer a variety of predictive, causal, and counterfactual questions. We demonstrate the efficacy of our model by comparing it against discriminative and generative machine learning models using two real-world datasets.

Prediction of Hospital Associated Infections During Continuous Hospital Stays

Previous methods evaluate reward models by testing them on a fixed pairwise ranking test set, but they typically do not provide performance information on each preference dimension. In this work, we address the evaluation challenge of reward models by probing preference representations. To confirm the effectiveness of this evaluation method, we construct a Multi-dimensional Reward Model Benchmark (MRMBench), a collection of six probing tasks for different preference dimensions. We design it to favor and encourage reward models that better capture preferences across different dimensions. Furthermore, we introduce an analysis method, inference-time probing, which identifies the dimensions used during the reward prediction and enhances its interpretability. Through extensive experiments, we find that MRMBench strongly correlates with LLM alignment performance, supporting it as a reliable reference for developing advanced reward models. By analyzing the evaluation results on MRMBench, we reveal that reward models struggle to simultaneously capture preferences across multiple dimensions, highlighting the potential of multi-objective optimization in reward modeling. Furthermore, our results demonstrate that the proposed inference-time probing method provides a reliable metric for assessing the confidence of reward predictions, leading to improved alignment of large language models.

Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models

Multi-modal knowledge graph completion (MMKGC) aims to infer missing entities of triples by leveraging heterogeneous information in knowledge graph (KG). However, existing approaches often struggle with inconsistent modality alignment, limited reasoning depth, and insufficient negative sample quality. In this work, we propose HFR-MKGC, a novel framework that integrates hierarchical modal fusion and Multimodal Large Language Model (MLLM) reasoning for robust and expressive MMKGC. Specifically, we introduce a relation-guided hierarchical modal fusion module, which conducts fine-grained intra-visual fusion and relation-guided cross-modal integration to yield rich entity representations. HFR-MKGC employs a fine-tuned MLLM to perform instruction-based triple reasoning, producing candidate entities for completion. Then, it constructs hard negative samples through textual perturbation by MLLM and visual feature augmentation with rotation and noise.HFR-MKGC optimizes the model via adversarial training. Extensive experiments on three MMKGC benchmarks demonstrate that our method outperforms state-of-the-art methods, validating its effectiveness in MMKGC.

HFR-MKGC: Hierarchical Fusion Reasoning with MLLMs for Multi-modal Knowledge Graph Completion

In Split Federated Learning (SFL), the clients collaboratively train a model with the help of a server by splitting the model into two parts. Part-1 is trained locally at each client and aggregated by the aggregator at the end of each round. Part-2 is trained at a server that sequentially processes the intermediate activations received from each client. We study the phenomenon of catastrophic forgetting (CF) in SFL in the presence of data heterogeneity. In detail, due to the nature of SFL, local updates of part-1 may drift away from global optima, while part-2 is sensitive to the processing sequence, similar to forgetting in continual learning (CL). Specifically, we observe that the trained model performs better in classes (labels) seen at the end of the sequence. We investigate this phenomenon with emphasis on key aspects of SFL, such as the processing order at the server and the cut layer. Based on our findings, we propose Hydra, a novel mitigation method inspired by multi-head neural networks and adapted for the SFL's setting. Extensive numerical evaluations show that Hydra outperforms baselines and methods from the literature.

Data Heterogeneity and Forgotten Labels in Split Federated Learning

Emotional support is a core capability in human-AI interaction, with applications including psychological counseling, role play, and companionship. However, existing evaluations of large language models (LLMs) often rely on short, static dialogues and fail to capture the dynamic and long-term nature of emotional support. To overcome this limitation, we shift from snapshot-based evaluation to trajectory-based assessment, adopting a user-centered perspective that evaluates models based on their ability to improve and stabilize user emotional states over time. Our framework constructs a large-scale benchmark consisting of 328 emotional contexts and 1,152 disturbance events, simulating realistic emotional shifts under evolving dialogue scenarios. To encourage psychologically grounded responses, we constrain model outputs using validated emotion regulation strategies such as situation selection and cognitive reappraisal. User emotional trajectories are modeled as a first-order Markov process, and we apply causally-adjusted emotion estimation to obtain unbiased emotional state tracking. Based on this framework, we introduce three trajectory-level metrics: Baseline Emotional Level (BEL), Emotional Trajectory Volatility (ETV), and Emotional Centroid Position (ECP). These metrics collectively capture user emotional dynamics over time and support comprehensive evaluation of long-term emotional support performance of LLMs. Extensive evaluations across a diverse set of LLMs reveal significant disparities in emotional support capabilities and provide actionable insights for model development.

Detecting Emotional Dynamic Trajectories: An Evaluation Framework for Emotional Support in Language Models

Fracture injuries often lead to complex bone fragmentations, posing significant challenges for accurate segmentation in surgical planning and trauma assessment. Manual annotation of each fragment is time-consuming and inconsistent, while existing automated methods often fail to separate individual fragments due to the wide variation in fracture types, irregular fracture surface, and close inter-fragment contact. 
To address these challenges, we introduce FracSegmentator, a deep learning approach for bone fragment instance segmentation. The model takes extracted bone regions in CT as input and isolates individual fragments by identifying fracture surfaces and separating closely contacting structures. Central to our approach is a Trauma-Prior-Guided Contrastive Learning module, which incorporates clinical knowledge through memory-based attention to better distinguish fractured surfaces from healthy regions. 
We evaluate FracSegmentator on four datasets that cover a range of anatomical sites and fracture patterns. The method achieves state-of-the-art results across all datasets and demonstrates strong generalization capabilities. By delivering accurate and efficient fragment-level segmentation, FracSegmentator supports critical downstream tasks such as automated fracture diagnosis, surgical planning, and preoperative reduction simulation.

FracSegmentator: Fracture Instance Segmentation with Trauma-Prior-Guided Contrastive Learning

Due to the diversity of brain anatomy and the scarcity of annotated data, supervised anomaly detection for brain MRI remains challenging, driving the development of unsupervised anomaly detection (UAD) approaches. 
Current UAD methods typically utilize synthetically generated noise perturbations on healthy MRIs to train generative models for normal anatomy reconstruction, enabling anomaly detection via residual mapping.
However, such simulated anomalies lack the biophysical fidelity and morphological complexity characteristic of true clinical lesions.
To advance UAD in brain MRI, we conduct the first systematic frequency-domain analysis of pathological signatures, revealing two key properties: (1) anomalies exhibit unique frequency patterns distinguishable from normal anatomy, and (2) low-frequency signals maintain consistent representations across healthy scans. These insights motivate our Frequency-Decomposition Preprocessing (FDP) framework—the first UAD method to leverage frequency-domain reconstruction for simultaneous pathology suppression and anatomical preservation. FDP can integrate seamlessly with existing anomaly simulation techniques, consistently enhancing detection performance across diverse architectures while maintaining diagnostic fidelity.
Experimental results demonstrate that FDP consistently improves anomaly detection performance when integrated with existing methods. Notably, FDP achieves a $17.63\%$ increase in DICE score with LDM while maintaining robust improvements across multiple baselines.

FDP: A Frequency-Decomposition Preprocessing Pipeline for Unsupervised Anomaly Detection in Brain MRI

Vision-Language-Action (VLA) models process visual inputs independently at each timestep, discarding valuable temporal information inherent in robotic manipulation tasks. This frame-by-frame processing makes models vulnerable to visual noise while ignoring the substantial coherence between consecutive frames in manipulation sequences. We propose Temporal Token Fusion (TTF), a training-free approach that intelligently integrates historical and current visual representations to enhance VLA inference quality. Our method employs dual-dimension detection combining efficient grayscale pixel difference analysis with attention-based semantic relevance assessment, enabling selective temporal token fusion through hard fusion strategies and keyframe anchoring to prevent error accumulation. Comprehensive experiments across LIBERO, SimplerEnv, and real robot tasks demonstrate consistent improvements: 4.0 percentage points average on LIBERO (72.4\% vs 68.4\% baseline), cross-environment validation on SimplerEnv (4.8\% relative improvement), and 8.7\% relative improvement on real robot tasks. Our approach proves model-agnostic, working across OpenVLA and VLA-Cache architectures. Notably, TTF reveals that selective Query matrix reuse in attention mechanisms enhances rather than compromises performance, suggesting promising directions for direct KQV matrix reuse strategies that achieve computational acceleration while improving task success rates.

TTF-VLA: Temporal Token Fusion via Pixel-Attention Integration for Vision-Language-Action Models

Differentially Private Stochastic Gradient Descent (DPSGD) is widely used to protect sensitive data during the training of machine learning models, but its privacy guarantee often comes at a large cost of model performance due to the lack of tight theoretical bounds quantifying privacy loss. While recent efforts have achieved more accurate privacy guarantees, they still impose some assumptions prohibited from practical applications, such as convexity and complex parameter requirements, and rarely investigate in-depth the impact of privacy mechanisms on the model’s utility. In this paper, we provide a rigorous privacy characterization for DPSGD with general $L$-smooth and non-convex loss functions, revealing converged privacy loss with iteration in bounded-domain cases. Specifically, we track the privacy loss over multiple iterations, leveraging the noisy smooth-reduction property, and further establish comprehensive convergence analysis in different scenarios. In particular, we show that for DPSGD with a bounded domain, (i) the privacy loss can still converge without the convexity assumption, (ii) a smaller bounded diameter can improve both privacy and utility simultaneously under certain conditions, and (iii) the attainable big-$\mathcal{O}$ order of the privacy utility trade-off for DPSGD with gradient clipping (DPSGD-GC) and for DPSGD-GC with bounded domain (DPSGD-DC) and $\mu$-strongly convex population risk function, respectively. Experiments via membership inference attack (MIA) in a practical setting validate insights gained from the theoretical results.

Content not yet available

Next from AAAI 2026

2D-CrossScan Mamba: Enhancing State Space Models with Spatially Consistent Multi-Path 2D Information Propagation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES