Singapore

As large language models (LLMs) exhibit advanced reasoning capabilities in different specialized domains, their application to legal reasoning tasks is actively being explored. The pursuit of justice in legal contexts demands not only correct outcomes but also reasoned elaboration, which necessitates deriving conclusions through logically justified and transparent argumentation. Current legal benchmarks, despite being cited as reasoning-focused, suffer from three critical limitations: conflation of factual recall with genuine inference, fragmentation of holistic reasoning processes, and neglect of reasoning process quality. To bridge these gaps, we construct MSLR, the first Chinese multi-step legal reasoning dataset centered on legal decision-making. To align with real-world legal reasoning trajectories, MSLR employs the IRAC framework (Issue-Rule-Application-Conclusion) to capture expert reasoning traces from official legal decisions. In parallel, we design a scalable human-LLM collaborative annotation pipeline that efficiently generates fine-grained step-level annotations while establishing a reusable methodological framework for multi-step reasoning datasets. Evaluation of a range of LLMs on MSLR reveals only modest performance, highlighting substantial challenges in adapting to complex legal reasoning. Further experiments show that self-initiated CoT prompts—created autonomously by the models—consistently improve reasoning coherence and output quality, outperforming human-designed CoT prompts, which often yield ambiguous results. This work contributes to the broader discourse on LLM reasoning and CoT strategies, offering practical insights and resources for future research. The dataset and code are publicly available.

AAAI 2026

Benchmarking Multi-Step Legal Reasoning and Analyzing Chain-of-Thought Effects in Large Language Models

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Cell phenotype transition refers to the changes in the morphology, function, and surface markers of cells that occur under specific environmental conditions or physiological states, based on their genomic information and external signals. This process plays an important role in development, tissue repair, and responses to external stimuli such as infection or inflammation. Traditional bioinformatics methods for addressing cell type transition often rely on hypothesis-driven models, which may not fully capture the complexity and heterogeneity of the transition processes. In this paper, we introduce DualCPT, a cell phenotype transition and differentiation model based on Markov processes. Specifically, the model consists of a classification branch and a transition branch. The transition branch identifies regulatory genes involved in cell phenotype transition and differentiation. In the classification branch, we evaluate the model’s overall performance on general cell type classification tasks using a comprehensive multi-metric evaluation framework; in the transition branch, we implement a token pruning-based approach for critical locus discovery and enhance information interaction between full-sequence contexts and prioritized regulatory sites via an improved multi-head attention mechanism. Cell phenotype transition tasks are further assessed by uncertainty quantification and confidence calibration. In particular, in gene knockout experiments, we found that knocking out important genes alters the probability of cell phenotype transition and differentiation, and knocking out a certain number of essential genes can terminate specific transition processes. Data, code, and checkpoints are publicly available at https://github.com/Ssupercoder/DualCPT.

DualCPT: Dual-branch Modeling for Cellular Phenotype Transition

Multimodal image fusion aims to integrate complementary information from multiple imaging modalities into a single, informative representation, which is crucial for applications in medical imaging and microscopy. Existing methods often face trade-offs between structural fidelity, edge preservation, and computational efficiency. In this work, we propose LightFusionNet, a lightweight dual-stream network designed to efficiently fuse multimodal images while retaining key structural, textural, and intensity features. The network leverages depthwise separable convolutions to reduce model complexity and incorporates a Predictive Context Attention (PCA) mechanism to selectively emphasize informative regions in the feature maps. Extensive experiments on benchmark medical imaging datasets, including PET-MRI, SPECT-MRI, and CT-MRI, demonstrate that our approach achieves comparable qualitative and quantitative performance compared to state-of-the-art fusion methods, while maintaining low computational cost. The proposed method provides an effective and efficient solution for multimodal image fusion, suitable for both clinical and research applications.

LightFusionNet: Lightweight Dual-Stream Network with Predictive Context Attention for Efficient Medical Image Fusion

Models trained on data from one site often underperform at another due to distribution shift, which can impede reliable deployment for medical imaging tasks. We tackle the problem of multi-center healthcare AI, where models must generalize across hospitals with heterogeneous imaging protocols, scanner hardware, and patient demographics. We propose a principled framework based on optimal transport dataset distances (OTDD) to analyze and guide cross‑site transfer. While OT-based approaches have mostly been applied to benchmark, non-medical datasets, we demonstrate its utility in a multi-center healthcare setting. We (i) build OTDD distance matrices between centers from fixed image embeddings (ResNet‑50, ResNet‑18), (ii) cluster centers using OT distance to reveal structure, and (iii) test whether transfer performance (finetuning on a single source, testing on a target) aligns with OTDD. We find that continuous OTDD distances correlate negatively with target AUROC across source-target pairs indicating that centers close in OTDD yield higher AUROC upon finetuning. Our results support OTDD as a center‑level selector/ranker for cross‑site adaptation in multi‑center healthcare, particularly when evaluated with threshold‑free metrics like AUROC.

Multi-Center Domain Shift in Healthcare AI: Optimal Transport-Based Quantification and Adaptation

Mental health challenges and cyberbullying are increasingly prevalent in digital spaces, necessitating scalable and interpretable detection systems. This paper introduces a unified multiclass classification framework for detecting ten distinct mental health and cyberbullying categories from social media data. We curate datasets from Twitter and Reddit, implementing a rigorous ‘split-then-balance’ pipeline to train on balanced data while evaluating on a realistic, held-out imbalanced test set. We conduct a comprehensive evaluation comparing traditional lexical models, hybrid approaches, and several end-to-end fine-tuned transformers. Our results demonstrate that end-to-end fine-tuning is critical for performance, with the domain-adapted MentalBERT emerging as the top model, achieving an accuracy of 0.92 and a Macro F1 score of 0.76, surpassing both its generic counterpart and a zero-shot LLM baseline. Grounded in a comprehensive ethical analysis, we frame the system as a human-in-the-loop screening aid, not a diagnostic tool. To support this, we introduce a hybrid SHAP-LLM explainability framework and present a prototype dashboard (“Social Media Screener”) designed to integrate model predictions and their explanations into a practical workflow for moderators. Our work provides a robust baseline, highlighting future needs for multi-label, clinically validated datasets at the critical intersection of online safety and computational mental health.

A Machine Learning Approach for Detection of Mental Health Conditions and Cyberbullying from Social Media

Accurate and reliable medical image classification is critical for clinical decision-making across diverse imaging modalities, including X-ray, CT, and MRI. Traditional convolutional neural networks often produce overconfident predictions, limiting their clinical trustworthiness. In this work, we propose an uncertainty-aware, attention-augmented neural network that integrates multi-scale SwirlAttention and FeedBackAttention modules with a Bayesian probabilistic classifier. This framework enables robust feature extraction, interpretable attention maps, and principled estimation of epistemic uncertainty. We evaluate our approach on four diverse datasets, including Diabetic Retinopathy, Kvasir, Skin Cancer, and fused multi-focal Oocyte images, covering a wide range of pathological and morphological variations. Extensive experiments demonstrate that our method outperforms state-of-the-art CNN and transformer-based baselines in terms of accuracy, calibration, and interpretability. Grad-CAM visualizations highlight clinically relevant regions, while uncertainty estimates provide actionable insights for ambiguous cases, making the framework suitable for reliable deployment in real-world clinical settings.

BUCAN: Bayesian Uncertainty-aware Classification with Attention Networks for Medical Images

Recent advances in large language models (LLMs) have shown strong reasoning capabilities through large-scale pretraining and post-training reinforcement learning, demonstrated by DeepSeek-R1. However, current post-training methods, such as Grouped Relative Policy Optimization (GRPO), mainly reward correctness, which is not aligned with the multi-dimensional objectives required in high-stakes fields such as medicine, where reasoning must also be faithful and comprehensive. We introduce Clinical-Objective Relative Policy Optimization (CRPO), a scalable, multi-objective, verifiable reinforcement learning method designed to align LLM post-training with clinical reasoning principles. CRPO integrates rule-based and verifiable reward signals that jointly optimize accuracy, faithfulness, and comprehensiveness without relying on human annotation. To demonstrate its effectiveness, we train Clinical-R1-3B, a 3B-parameter model for clinical reasoning. The experiments on three benchmarks demonstrate that our CRPO substantially improves reasoning on truthfulness and completeness over standard GRPO while maintaining comfortable accuracy enhancements. This framework provides a scalable pathway to align LLM reasoning with clinical objectives, enabling safer and more collaborative AI systems for healthcare while also highlighting the potential of multi-objective, verifiable RL methods in post-training scaling of LLMs for medical domains.

Clinical-R1: Empowering Large Language Models for Faithful and Comprehensive Reasoning with Clinical Objective Relative Policy Optimization

Federated learning enables collaborative model development across medical institutions without centralizing sensitive patient data, yet existing embedding-level generative approaches often degrade under non-IID clinical heterogeneity and offer limited formal protection against gradient leakage. We introduce FedHypeVAE, a differentially private, hypernetwork based conditional variational framework that generates client-specific decoders and priors from lightweight, trainable client codes. This bi-level formulation personalizes the generative process while ensuring privacy preserving parameter synthesis decoupled from raw medical images. Federated optimization with differential privacy and distributional alignment strategies improves stability and cross-site generalization. The proposed framework unifies personalization, privacy, and domain adaptability within the generative layer, offering a principled solution for privacy aware representation learning in multi institutional medical imaging.

FedHypeVAE: Federated Learning with Hypernetwork Generated Conditional VAEs for Differentially Private Embedding Sharing

In recent years, the challenge of handling out-of-distribution (OOD) data has become central to ensuring the reliability of artificial intelligence deployed in open-world environments. This half-day tutorial at AAAI’26 is designed for students, researchers, and practitioners seeking to advance the machine learning models by providing an integrated overview of OOD detection and OOD generalization. The tutorial will introduce foundational concepts and theoretical principles, survey advanced methodologies across post-hoc scoring, outlier exposure, representation learning, as well as causality-inspired approaches, and highlight challenges together with emerging solutions. Real-world case studies across different domains will demonstrate the critical importance of addressing OOD risks in practice, while discussions will emphasize the distinctions and synergies between detection and generalization as well as open research directions. To participate effectively, attendees are expected to have basic familiarity with linear algebra, probability, and fundamental machine learning concepts, though the content is designed to remain accessible and emphasize intuitive understanding over technical formalism.

Handling Out-of-Distribution Data in the Open World: Principles and Practice for Reliable AI

Although many LLM-based agents have recently been developed, progress on agents that are capable of solving complex, real-world problems is severely limited. One such important and challenging area is that of addressing IT management tasks, including solving IT incidents, which often require extensive human expertise and effort. To enable the development of agents for these tasks, in this lab, we introduce IT-Bench, an open benchmark for IT automation that simulates realistic environments where agents interact with IT systems and multi-modal operational data, including logs, metrics, alerts, and traces. ITBench provides a two- tiered benchmark: a static dataset (ITBench_static) that closely mirrors the live, gym-like environment (ITBench_live). Together, they provide a testbed for benchmarking agentic systems across a host of critical challenges, such as planning and reasoning over massive and heterogeneous IT data, safety, and stochasticity in live IT systems. Through demonstrations, participants will learn all of these challenges. ITBench_static is more beginner-friendly, enabling rapid benchmarking of agents against a smaller set of unique IT-domain challenges. ITBench_live enables more advanced researchers and practitioners familiar with IT systems to develop and test their agents against the full suite of challenging IT problems. We will guide attendees to use both ITBench_static and ITBench_live to develop baseline multi-agent systems and benchmark their performance. No prior IT or site reliability engineering (SRE) experience is required.

Premium content

Next from AAAI 2026

DualCPT: Dual-branch Modeling for Cellular Phenotype Transition

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES