Singapore


Reliable clinical decision support requires medical AI agents capable of safe, multi-step reasoning over structured electronic health records (EHRs). While large language models (LLMs) show promise in healthcare, existing benchmarks inadequately assess performance on action-based tasks involving threshold evaluation, temporal aggregation, and conditional logic. We introduce ART, an Action-based Reasoning clinical Task benchmark for medical AI agents, which mines real-world EHR data to create challenging tasks targeting known reasoning weaknesses. Through analysis of existing benchmarks, we identify three dominant error categories: retrieval failures, aggregation errors, and conditional logic misjudgments. Our four-stage pipeline - scenario identification, task generation, quality audit, and evaluation, produces diverse, clinically validated tasks grounded in real patient data. Evaluating GPT-4o-mini and Claude 3.5 Sonnet on 600 tasks shows near-perfect retrieval after prompt refinement, but substantial gaps in aggregation (28-64%) and threshold reasoning (32-38%). By exposing failure modes in action-oriented EHR reasoning, ART advances toward more reliable clinical agents, an essential step for AI systems that reduce cognitive load and administrative burden, supporting workforce capacity in high-demand care settings

AAAI 2026

ART: Action-based Reasoning Task Benchmarking for Medical AI Agents

workshop paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Teaching machine learning (ML) workflows to non-programmers
remains a challenge in introductory AI courses.
Traditionally, educators have turned to no-code tools such
as KNIME to lower barriers. With the rise of generative AI
(GenAI), students can now construct ML pipelines through
natural language prompts, potentially offering a new
“no-code” pathway. In a polytechnic-wide elective in
Singapore, students were given the choice of using either
KNIME or a GenAI chatbot for practical exercises and their
semester project. Survey responses, informal interviews,
and classroom observations revealed that both tools
supported conceptual learning, but students’ experiences
diverged: KNIME provided predictability and structured
guidance, while GenAI offered speed and flexibility yet
posed setup challenges and required coding familiarity.
Students valued having a choice, though this complicated
teaching logistics. Our experience suggests that GenAI can
complement—but not yet replace—traditional no-code
platforms, and that the design of introductory activities
is critical for adoption. We share lessons learned for
educators considering GenAI as an alternative in
workflow-based ML education.

Understanding the Effects of GenAI as No-Code Alternative for Teaching Machine Learning Workflows

Generative AI has moved from pilots to everyday practice,
delivering gains in productivity and accessibility while
surfacing present-day risks—hallucinations and reliability
failures, bias and unfairness, prompt-injection attacks,
and so on. These trends make AI safety education a core
competency. In this paper, we survey global AI safety
curricula and, in the Japanese context, observe strong
policy momentum but relatively few courses that explicitly
combine capability instruction with systematic safety
evaluation. In response, we developed a 7-week,
graduate-level intensive at a private science and
engineering university in Japan, with enrollment open to
international exchange students at both the undergraduate
and graduate levels. The curriculum progresses from
machine-learning foundations to generative models and
alignment, with introductory agent topics included to
support risk reasoning. Delivery combines weekly lectures,
invited talks from academia and industry, structured group
discussions, and a final presentation plus a paper-style
final project focused on risk evaluation and mitigation
planning. An end-of-course survey indicates high perceived
learning and positive experience and one student project
later resulted in a peer-reviewed workshop paper at ICLR 2025.

Beyond Prompting: AI Safety Education in the Generative AI Era

Research on how the popularization of generative Artificial
Intelligence (AI) tools is impacting learning environments
has led to hesitancy among educators to teach these tools
in classrooms, creating two observed disconnects.
Generative AI competency is increasingly valued in industry
but not in higher education, and students are experimenting
with generative AI without formal guidance. Grounded in
learning sciences literature, the authors believe students
across fields must be taught to responsibly and expertly
harness the potential of AI tools to ensure job market
readiness and positive outcomes. Computer Science
trajectories are particularly impacted, and while
consistently top ranked U.S. Computer Science departments
teach the mechanisms and frameworks underlying AI, few
appear to offer courses on applications for existing
generative AI tools. A course was developed at a private
research university to teach undergraduate and graduate
Computer Science students applications for generative AI
tools in software development. Two mixed method surveys
indicated students overwhelmingly found the course valuable
and effective. Co-authored by the professor and one of the
graduate students, this paper explores the context,
implementation, and impact of the course through data
analysis and reflections from both perspectives. It
additionally offers recommendations for replication in and
beyond Computer Science departments.

Bridging the Skills Gap: A Course Model for Modern Generative AI Education

As large language models (LLMs) and chatbots become
increasingly prevalent, there is an urgent need to create
engaging, age-appropriate learning activities that foster
foundational AI literacy with a focus on natural language
processing (NLP). This paper presents the iterative design
and implementation of three instructional activities that
introduce middle school learners (ages 11 to 14) to NLP
concepts through playful, hands-on experiences aligned with
the AI4K12 Big Idea of Natural Interaction. These
activities include: (1) an unplugged card game to develop
students' understanding of embeddings and similarity, (2)
an unplugged collaborative sentence-generation challenge
that illustrates how language models work, and (3) a
web-based educational game in which students design and
interact with chatbots. Each activity has been implemented
and refined through multiple iterations in diverse
educational contexts, including teacher professional
development workshops, summer camps, and middle school
classrooms. All activities are designed to be easy to set
up, require only commonly available classroom technology
(e.g., laptops) and a few inexpensive materials (e.g.,
decks of cards), and are supported with facilitation guides
and reflection prompts. Early implementations revealed some
confusion and the need for clearer instructions, but
post-refinement surveys showed that students found the
activities both enjoyable and educational. Findings suggest
that blending unplugged and digital formats enhances
comprehension, and that tailoring content to students'
local contexts supports engagement. By making these
activities and supporting materials openly available, this
work contributes to the growing ecosystem of K–12 AI
education resources and offers practical guidance for
integrating NLP concepts into classrooms.

From Embeddings to Chatbots: Playful NLP Activities for Middle School AI Literacy

Artificial Intelligence (AI), particularly in the form of
intelligent AI agents, is transforming education, industry,
and everyday life. These agents extend the capabilities of
Large Language Models (LLMs) by integrating planning,
decision-making, tool use, and multi-agent collaboration,
enabling systems that can reason, adapt, and act in dynamic
environments. As such systems become integral to modern
workplaces and everyday problem solving, early exposure
equips high school students with systems thinking,
practical problem-solving skills, and ethical awareness,
while preparing them to create applications that address
real-world needs. Yet most high school AI programs focus on
basic model usage and overlook the skills required to
design and deploy agentic systems. Existing resources are
largely aimed at university learners and assume substantial
programming expertise, creating a significant accessibility
gap. To address this need, we present a structured
hackathon-based framework for introducing high school
students to the design and application of AI agents. The
framework combines expert-led lectures on core topics such
as agent architectures, prompting strategies, reasoning
methods, and tool-use protocols with a guided hackathon in
which students collaboratively develop domain-specific
agent-based chatbots. We provide a complete suite of
instructional materials, step-by-step tutorials, and
starter code to support hands-on learning, enabling
participants to build functional agents capable of
reasoning and interacting with external tools. Our approach
bridges the gap between AI literacy and practical
deployment while fostering creativity, collaboration, and
responsible innovation, and our findings suggest that early
engagement with agent-based AI design equips students with
both technical proficiency and the mindset to shape the
AI-driven future.

Catching the First Light of Tomorrow: A Hackathon-Based Framework for Introducing High School Students to AI Agents

Minimizing invasive diagnostic procedures is a central goal
in medical imaging. Perineural invasion (PNI), a critical
prognostic factor where tumors infiltrate nerves, remains
difficult to confirm noninvasively, as its features are
often imperceptible in conventional MRI. PNI research is
severely hampered by data scarcity. Our study utilized a
dataset collected over a decade at Samsung Medical Center
(SMC), initially comprising 306 patients. After rigorous
quality control, the final cohort included 128 T1-weighted
hepatobiliary phase MRI scans, exhibiting significant class
imbalance (44 PNI-positive/84 PNI-negative). To address
these challenges, we present NeoNet, the first integrated
end-to-end 3D deep learning framework for PNI prediction in
cholangiocarcinoma that avoids reliance on radiomics or
handcrafted features. NeoNet integrates three modules: (1)
NeoSeg, utilizing a Tumor-Localized ROI Crop (TLCR)
algorithm; (2) NeoGen, a 3D Latent Diffusion Model (LDM)
with ControlNet, conditioned on anatomical masks to
generate synthetic image patches, specifically balancing
the dataset to a 1:1 ratio; and (3) NeoCls, the final
prediction module. For NeoCls, we developed the
PNI-Attention Network (PattenNet), which uses the frozen
LDM encoder and specialized 3D Dual Attention Blocks (DAB)
designed to detect subtle intensity variations and spatial
patterns indicative of PNI. In rigorous 5-fold
cross-validation, NeoNet outperformed baseline 3D models.
By leveraging synthetic data for balanced training,
PattenNet achieved the highest performance with a maximum
AUC of 0.7903.

NeoNet: An End-to-End 3D MRI-Based Deep Learning Framework
for Non-Invasive Prediction of Perineural Invasion via
Generation-Driven Classification

Recent work has established learned k-space acquisition pat-
terns as a promising direction for improving reconstruction
quality in accelerated Magnetic Resonance Imaging (MRI).
Despite encouraging results, most existing research focuses
on acquisition patterns optimized for a single dataset or
modality, with limited consideration of their transferabil-
ity across imaging domains. In this work, we demonstrate
that the benefits of learned k-space sampling can extend
beyond the training domain, enabling superior reconstruc-
tion performance under domain shifts. Our study presents
two main contributions. First, through systematic evalua-
tion across datasets and acquisition paradigms, we show
that models trained with learned sampling patterns exhibit
improved generalization under cross-domain settings. Sec-
ond, we propose a novel method that enhances domain ro-
bustness by introducing acquisition uncertainty during
train-
ing—stochastically perturbing k-space trajectories to simu-
late variability across scanners and imaging conditions. Our
results highlight the importance of treating k-space
trajectory
design not merely as an acceleration mechanism, but as an
ac-
tive degree of freedom for improving domain generalization
in MRI reconstruction.

On The Role of K-Space Acquisition in MRI Reconstruction
Domain-Generalization

Loneliness is a major determinant of poor health and wellbeing in older adults, yet existing interventions remain limited in accessibility, personalisation, and long-term effectiveness. Emerging technologies, including artificial intelligence (AI) and virtual reality (VR), offer new opportunities for emotionally adaptive and engaging support. This study explores how older adults perceive the potential of VR to address loneliness through a participatory focus group and interactive VR workshop. Thematic analysis revealed that participants sought technologies fostering agency, emotional regulation, and meaningful social connection, rather than existing strategies of passive distraction. We present several AI integration approaches that can actualise participants’ visions of effective loneliness support, augmenting the therapeutic capabilities of immersive technologies. These findings highlight the promise of AI-enhanced and ethically grounded VR interventions as tools for emotional growth and healthy ageing, emphasising the importance of co-design and user-centred development in future digital therapeutics for loneliness.

From Isolation to Connection: Participatory Insights on AI and VR for Healthy Ageing

This work addresses the large inter-individual variability in physiological adaptation to exercise, which limits the effectiveness of generalized programs. We introduce mechanobioAI, a knowledge-graph framework that operationalizes mechanobiology to infer adaptive predispositions and generate N-of-1, parameterized exercise prescriptions. The framework links observable performance variables such as strength, velocity, and recovery to latent mechanobiological traits including muscle–tendon stiffness, fiber-type composition, and recovery kinetics. Estimated predispositions are integrated with a quantitative model of the titin-kinase mechanosensitive switch to prescribe mechanical dosages—load, volume, frequency, and tempo—targeting specific molecular pathways. Beyond dosage, the knowledge graph also constrains exercise selection so that chosen movements instantiate the intended mechanical stimuli, aligning program design with user goals. 

By encoding mechanobiological and genetic principles directly within a knowledge graph, mechanobioAI moves beyond proxy-based AI personalization toward biologically grounded, mechanism-informed coaching. Preliminary feasibility analyses using public datasets that couple anthropometrics and performance proxies with common polymorphisms suggest that graph-inferred traits can recapitulate known genotype–phenotype links. Practically, this approach seeks to increase the speed and magnitude of beneficial adaptation, reduce injury risk through better control of mechanical exposure, and address the lack of principled personalization in current digital fitness tools. The result is a coherent pathway from cellular mechanisms to individualized training programs, enabling scalable N-of-1 prescriptions that are both interpretable and actionable.

mechanobioAI: A Mechanobiology-Driven Knowledge Graph for N-of-1 Exercise Personalization

Chronic disease management requires regular adherence feedback to prevent avoidable hospitalizations, yet clinicians lack time to produce personalized patient communications. Manual authoring preserves clinical accuracy but does not scale; AI generation scales but can undermine trust in patient-facing contexts. We present a clinician-in-the-loop interface that constrains AI to data organization and preserves physician oversight through recognition-based review. A single-page editor pairs AI-generated section drafts with time-aligned visualizations, enabling inline editing with visual evidence for each claim. This division of labor (AI organizes, clinician decides) targets both efficiency and accountability. In a pilot with three physicians reviewing 24 cases, AI successfully generated clinically personalized drafts matching physicians' manual authoring practice (overall mean 4.86/10 vs. 5.0/10 baseline), requiring minimal physician editing (mean 8.3\% content modification) with zero safety-critical issues, demonstrating effective automation of content generation. However, review time remained comparable to manual practice, revealing an accountability paradox: in high-stakes clinical contexts, professional responsibility requires complete verification regardless of AI accuracy. We contribute three interaction patterns for clinical AI collaboration: bounded generation with recognition-based review via chart-text pairing, automated urgency flagging that analyzes vital trends and adherence patterns with fail-safe escalation for missed critical monitoring tasks, and progressive disclosure controls that reduce cognitive load while maintaining oversight. These patterns indicate that clinical AI efficiency requires not only accurate models, but also mechanisms for selective verification that preserve accountability.

Premium content

Next from AAAI 2026

Understanding the Effects of GenAI as No-Code Alternative for Teaching Machine Learning Workflows

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES