Singapore

Reinforcement Learning (RL) faces significant challenges in
adaptive healthcare interventions, such as dementia care,
where data is scarce, decisions require interpretability,
and underlying patient-state dynamic are complex and causal
in nature. In this work, we present a novel framework
called Causal structure-aware Reinforcement Learning (CRL)
that explicitly integrates causal discovery and reasoning
into policy optimization. This method enables an agent to
learn and exploit a directed acyclic graph (DAG) that
describes the causal dependencies between human behavioral
states and robot actions, facilitating more efficient,
interpretable, and robust decision-making.
We validate our approach in a simulated robot-assisted
cognitive care scenario, where the agent interacts with a
virtual patient exhibiting dynamic emotional, cognitive,
and engagement states. The experimental results show that
CRL agents outperform conventional model-free RL baselines
by achieving higher cumulative rewards, maintaining
desirable patient states more consistently, and exhibiting
interpretable, clinically-aligned behavior. We further
demonstrate that CRL’s performance advantage remains robust
across different weighting strategies and hyperparameter
settings. In addition, we demonstrate a lightweight
LLM-based deployment: a fixed policy is embedded into a
system prompt that maps inferred states to actions,
producing consistent, supportive dialogue without LLM
finetuning. Our work illustrates the promise of causal
reinforcement learning for human-robot interaction
applications, where interpretability, adaptiveness, and
data efficiency are paramount.

AAAI 2026

Causal Reinforcement Learning based Agent-Patient
Interaction with Clinical Domain Knowledge

workshop paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Minimizing invasive diagnostic procedures is a central goal
in medical imaging. Perineural invasion (PNI), a critical
prognostic factor where tumors infiltrate nerves, remains
difficult to confirm noninvasively, as its features are
often imperceptible in conventional MRI. PNI research is
severely hampered by data scarcity. Our study utilized a
dataset collected over a decade at Samsung Medical Center
(SMC), initially comprising 306 patients. After rigorous
quality control, the final cohort included 128 T1-weighted
hepatobiliary phase MRI scans, exhibiting significant class
imbalance (44 PNI-positive/84 PNI-negative). To address
these challenges, we present NeoNet, the first integrated
end-to-end 3D deep learning framework for PNI prediction in
cholangiocarcinoma that avoids reliance on radiomics or
handcrafted features. NeoNet integrates three modules: (1)
NeoSeg, utilizing a Tumor-Localized ROI Crop (TLCR)
algorithm; (2) NeoGen, a 3D Latent Diffusion Model (LDM)
with ControlNet, conditioned on anatomical masks to
generate synthetic image patches, specifically balancing
the dataset to a 1:1 ratio; and (3) NeoCls, the final
prediction module. For NeoCls, we developed the
PNI-Attention Network (PattenNet), which uses the frozen
LDM encoder and specialized 3D Dual Attention Blocks (DAB)
designed to detect subtle intensity variations and spatial
patterns indicative of PNI. In rigorous 5-fold
cross-validation, NeoNet outperformed baseline 3D models.
By leveraging synthetic data for balanced training,
PattenNet achieved the highest performance with a maximum
AUC of 0.7903.

NeoNet: An End-to-End 3D MRI-Based Deep Learning Framework
for Non-Invasive Prediction of Perineural Invasion via
Generation-Driven Classification

Recent work has established learned k-space acquisition pat-
terns as a promising direction for improving reconstruction
quality in accelerated Magnetic Resonance Imaging (MRI).
Despite encouraging results, most existing research focuses
on acquisition patterns optimized for a single dataset or
modality, with limited consideration of their transferabil-
ity across imaging domains. In this work, we demonstrate
that the benefits of learned k-space sampling can extend
beyond the training domain, enabling superior reconstruc-
tion performance under domain shifts. Our study presents
two main contributions. First, through systematic evalua-
tion across datasets and acquisition paradigms, we show
that models trained with learned sampling patterns exhibit
improved generalization under cross-domain settings. Sec-
ond, we propose a novel method that enhances domain ro-
bustness by introducing acquisition uncertainty during
train-
ing—stochastically perturbing k-space trajectories to simu-
late variability across scanners and imaging conditions. Our
results highlight the importance of treating k-space
trajectory
design not merely as an acceleration mechanism, but as an
ac-
tive degree of freedom for improving domain generalization
in MRI reconstruction.

On The Role of K-Space Acquisition in MRI Reconstruction
Domain-Generalization

Phenotyping from electronic health records (EHRs) requires
integrating structured signals and unstructured narratives,
yet existing multimodal models often rely on latent
embeddings that obscure reasoning and fail to capture
evolving patient dynamics. We propose a purely LLM-driven
approach that fuses modalities directly in explicit textual
space, enabling transparent knowledge representation and
interpretable reasoning for disease phenotyping. Using the
MIMIC-III dataset, we combine hospital-course sections from
discharge summaries with synthesized ICU time-series
narratives to jointly capture chronic and acute processes.
Across 25 benchmark diseases, LLM-based named entity
recognition, relation classification, and narrative
synthesis yield average F1 improvements of 10-14% over
baselines. A detailed case study on congestive heart
failure further examines the reasoning mechanisms
underlying phenotyping using an LLM-as-judge evaluation
setup, demonstrating how structured extraction, temporal
narratives, and Chain-of-Thought prompting interact to
elicit clinically faithful and auditable reasoning,
suggesting extensibility to diverse phenotypes.

Enhancing LLM-Based Phenotyping via Structured KnowledgeRepresentation and Chain-of-Thought Reasoning on EHR Data

Simulating high-fidelity patients offers a powerful avenue
for studying complex diseases while addressing the
challenges of fragmented, biased, and privacy-restricted
real-world data. In this study, we introduce SynthAgent, a
novel Multi-Agent System (MAS) framework designed to model
obesity patients with comorbid mental disorders, including
depression, anx- iety, social phobia, and binge eating
disorder. SynthAgent integrates clinical and medical
evidence from claims data, population surveys, and
patient-centered literature to con- struct personalized
virtual patients enriched with personal- ity traits that
influence adherence, emotion regulation, and lifestyle
behaviors. Through autonomous agent interactions, the
system simulates disease progression, treatment response,
and life management across diverse psychosocial contexts.
Evaluation of more than 100 generated patients demonstrated
that GPT-5 and Claude 4.5 Sonnet achieved the highest fi-
delity as the core engine in the proposed MAS framework,
outperforming Gemini 2.5 Pro and DeepSeek-R1. SynthA- gent
thus provides a scalable and privacy-preserving frame- work
for exploring patient journeys, behavioral dynamics, and
decision-making processes in both medical and psycho-
logical domains.

SynthAgent: A Multi-Agent LLM Framework for Realistic
Patient Simulation - A Case Study in Obesity with Mental
Health Comorbidities

Immunohistochemistry (IHC) is crucial for cancer diagnosis
to predict treatment response and determine prognosis.
However, IHC analysis remains heavily manual,
time-intensive, and prone to variability. Prior Artificial
Intelligence (AI) models for IHC analysis were based on a
single modality, while state-of-the-art (SOTA)
vision-language models (VLMs), such as CLIP-based PLIP,
have shown great promise in other medical analysis tasks
such as Hematoxylin and Eosin (H&E)-stained histopathology
classification. However, their hard contrastive loss makes
them prone to false negatives. We propose HIS2LIP, the
first VLM for multimodal IHC biomarker-stained images and
text pairs. HIS2LIP improves upon the SOTA CLIP-based
models by exploiting a novel Weighted Inter- and
Intra-modal Soft Embeddings Contrastive Loss (WISE-CL) to
mitigate false negatives, and leveraging domain-specific
contextual information (e.g., tissue types and biomakers)
to improve representation learning. We present three
HIS2LIP variants: HIS2CLIP, HIS2PLIP, and HIS2BiomedCLIP,
based respectively on CLIP, PLIP, and BioMedCLIP backbones.
We fine-tune HIS2LIP variants on MIHIC, the largest
publicly available dataset of IHC image patches, enriching
it with high-quality captions generated using Large
Language Models GPT-4 and Llama-3 in close collaboration
with an expert pathologist who also meticulously validated
the generated captions. An exhaustive evaluation against
SOTA VLMs demonstrates the superiority of HIS2LIP variants,
achieving multiplicative gains in zero-shot classification
of up to 3.5× over CLIP on the MIHIC dataset, and
improvements of 3.21×, 1.16×, and 1.28× in image-to-text,
text-to-text, and image-to-image retrieval, respectively.
Moreover, HIS2LIP reduces patch-level analysis time from
∼15 minutes to ∼6 seconds, while maintaining high accuracy.
Code available at:
https://github.com/his2lip/his2lip-HIS2LIP-model--for-IHC-Image-Analysis.

HIS2LIP: Leveraging Weighted Inter- and Intra-modal Soft
Embeddings Contrastive Loss for Fine-Grained IHC Image
Analysis

Liver tumor segmentation, dynamic enhancement regression,
and classification are critical for clinical assessment and
diagnosis. However, no prior work has attempted to achieve
these tasks simultaneously in an end-to-end framework,
primarily due to the lack of an effective framework that
captures inter-task relevance for mutual improvement and
the absence of a mechanism to effectively extract dynamic
MRI information. To address these challenges, we propose
the Multi-Task Interaction adversarial learning Network
(MTI-Net), a novel integrated framework designed to tackle
these tasks simultaneously. MTI-Net incorporates
Multi-domain Information Entropy Fusion (MdIEF), which
utilizes entropy-aware, high-frequency spectral information
to effectively integrate features from both frequency and
spectral domains, enhancing the extraction and utilization
of dynamic MRI data. The network also introduces a task
interaction module that establishes higher-order
consistency between segmentation and regression, thus
fostering inter-task synergy and improving overall
performance. Additionally, we designed a novel task-driven
discriminator (TDD) to capture internal high-order
relationships between tasks. For dynamic MRI information
extraction, we employ a shallow Transformer network to
perform positional encoding, which captures the
relationships within dynamic MRI sequences. In experiments
on a dataset of 238 subjects, MTI-Net demonstrates high
performance across multiple tasks, indicating its strong
potential for assisting in the clinical assessment of liver
tumors. The source code will be publicly available.

Adversarial Multi-Task Learning for Liver Tumor
Segmentation, Dynamic Enhancement Regression, and
Classification

In pharmacovigilance, analyzing drug safety cases is often
time consuming due to the abundance of laboratory data,
complex medical histories, and intricate temporal
relationships. Agentic AI systems can significantly reduce
case processing time by assisting medical reviewers in
surfacing clinically relevant evidence. However, previous
studies have highlighted that large language models alone
lack causal reasoning and evidence-based interpretability.

To address these limitations, we present DRAGON, a
knowledge-grounded safety case analysis framework that
integrates disproportionality analysis to generate and
prioritize potential adverse event hypotheses. The system
demonstrates how structured medical knowledge and
statistical evidence can be combined to support a reliable,
explainable case assessment and can be readily extended
with causal inference modules for deeper clinical reasoning.

DRAGON: DRug safety Agents Using Graphs and ONtologies

When analyzing a mental health conversation between a
counselor and his/her client, one examine the semantics
underlying the utterances of conversation to understand if
the counselor has practiced the appropriate psychotherapy
techniques at different points of the conversation. Despite
the many breakthroughs in solving NLP tasks,
state-of-the-art large language models (LLMs) still perform
poorly on this utterance label prediction task. While a
simple supervised learning architecture combining an
utterance encoder with a linear softmax layer can yield
better accuracy, the trained classifiers still suffer from
poor quality ground truth labels assigned by human
annotators. Motivated by this observation, we propose a
quality-aware framework that derives quality weights of
ground truth utterance labels, trains a target classifier
in two stages, and evaluates the target classifier with
quality weights. Our experiments on three mental health
conversation datasets show that target classifiers trained
using our framework yield significantly improved accuracy
over classifiers trained not using quality weights, even
outperforming the strong LLMs using direct prompting.

Not All Labels Are Equal: On Predicting Utterance Labels in
Mental Health Conversation Data

Recent studies in pathology foundation models have shown
that scaling training data, diversifying cancer types, and
increasing model size consistently improve their
performance. However, giga-scale foundation models, which
are trained on hundreds of thousands of slides covering
tens of cancer types and contain billions of parameters,
pose significant challenges for practical use due to their
tremendous computational costs in both development and
deployment. In this work, we present a novel strategy,
named the G2L framework, to increase the performance of
large-scale foundation models, which consist of only 15% of
the parameters of giga-scale models, to a comparable
performance level of giga-scale models in cancer-specific
tasks. Our approach applies knowledge distillation,
transferring the capabilities of a giga-scale model to a
large-scale model, using just 1K pathology slides of a
target cancer (e.g., breast, prostate, etc.). The resulting
distilled model not only outperformed state-of-the-art
models of the same size (i.e., large-scale) across several
benchmarks but also, interestingly, surpassed the
giga-scale teacher and huge-scale models in some
benchmarks. In addition, the distilled model exhibited a
higher robustness index, indicating improved resilience to
image variations originating from multiple institutions.
These findings suggest that the proposed distillation
approach for a large-scale model is a data- and
parameter-efficient way to achieve giga-scale-level
performance for cancer-specific applications without
prohibitive computational burden.

G2L: From Giga-Scale to Cancer-Specific Large-Scale
Pathology Foundation Models via Knowledge Distillation

Lung adenocarcinoma is the most common histological subtype
of lung cancer. The pathological diagnosis of invasion is
very important to predict patients’ prognoses, but
interobserver variability among pathologists remains a
major issue. In this study, we propose a two-step deep
learning framework for prognostic and clinicopathological
prognostic factor prediction, in which dot-style images are
generated from high-resolution EVG-stained pathological
images and then used as model inputs. Prognosis and seven
prognostic factors are each treated as independent binary
classification tasks. The generated dot-style images
exhibit severe class imbalance. To address class imbalance
in classification tasks, we extend the existing (Cross
Entropy–based) Logit-Adjusted Loss (LA-CE) and propose a
new loss function called Logit-Adjusted Dice Loss
(LA-Dice). LA-Dice integrates the batch-level direct
optimization of the F1-score property of Dice loss with a
logit-adjustment mechanism based on the class prior
distribution of the training data, thereby enhancing the
learning of the minority class. As a result, LA-Dice
simultaneously overcomes (i) the majority-class bias
inherent in CE-based losses caused by pointwise
minimization and (ii) the inability of Cross Entropy-based
losses to directly optimize the F1-score, which is critical
for medical image classification. In our experiments, the
proposed framework achieved an average accuracy of 89.2% in
dot-style image generation. Furthermore, by using the
generated dot-style images, binary classifications for
prognosis and the seven prognostic factors were evaluated.
When averaged across all eight tasks, the proposed LA-Dice
achieved the highest mean F1-score (×100) of 58.0, showing
improvements of +6.0 over Cross Entropy, +3.3 over Dice,
and +3.3 over LA-CE. Similarly, the proposed LA-Dice
achieved the highest average mean accuracy of 59.0%,
outperforming CE by +3.7 points, Dice by +2.5 points, and
LA-CE by +2.4 points.

Prognosis and Clinicopathological Prognostic FactorPrediction in Lung Adenocarcinoma Using the Logit-AdjustedDice Loss: A New Loss Function for Addressing ClassImbalance

Premium content

Next from AAAI 2026

NeoNet: An End-to-End 3D MRI-Based Deep Learning Framework for Non-Invasive Prediction of Perineural Invasion via Generation-Driven Classification

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES