Singapore

Fine-tuning pretrained large language models (LLMs) lies at the core of modern AI applications. Recent advances in fine-tuning methods—such as reinforcement learning (RL), have led to substantial improvements. However, multiple studies have shown that fine-tuning often degrades model safety, even in models explicitly trained for safety. In particular, LLMs fine-tuned for reasoning consistently exhibit increased safety risks, raising concerns about their deployment. In this work, we demonstrate that reinforcement learning with verifiable rewards (RLVR), a method often combined with SFT, can maintain safety guardrails without compromising reasoning performance. Our empirical evaluations provide quantitative evidence supporting this claim across diverse models and settings. Additionally, we present a theoretical framework that formalizes the safety preserving properties of RLVR, offering deeper insight.

AAAI 2026

Breaking the Safety-Capability Tradeoff: Reinforcement Learning with Verifiable Rewards Maintains Safety Guardrails in LLMs

workshop paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Privacy leakage in AI-based decision processes poses significant risks, particularly when sensitive information can be inferred. We propose a formal framework to audit privacy leakage using abductive explanations, which identifies minimal sufficient evidence justifying model decisions and determines whether sensitive information disclosed. Our framework formalizes both individual and system-level leakage, introducing the notion of Potentially Applicable Explanations (PAE) to identify individuals whose outcomes can shield those with sensitive features. This approach provides rigorous privacy guarantees while producing human-understandable explanations, a key requirement for auditing tools. Experimental evaluation on the German Credit Dataset illustrates how the importance of sensitive literal in the model decision process affects privacy leakage.

Beyond Verification: Abductive Explanations for Post-AI Assessment of Privacy Leakage

We explore AI-driven distributed-systems policy design by combining stochastic code generation from large language models (LLMs) with deterministic verification in a domain-specific simulator. Using a Function-as-a-Service runtime (Bauplan) and its open-source simulator (Eudoxia) as a case study, we frame scheduler design as an iterative generate-and-verify loop: an LLM proposes a Python policy, the simulator evaluates it on standardized traces, and structured feedback steers subsequent generations. This setup preserves interpretability while enabling targeted search over a large design space. We detail the system architecture and report preliminary results on throughput improvements across multiple models. Beyond early gains, we discuss the limits of the current setup and outline next steps; in particular, we conjecture that AI will be crucial for scaling this methodology by helping to bootstrap new simulators.

AI for Distributed Systems Design: Scalable Cloud Optimization Through Repeated LLMs Sampling And Simulators

We introduce INDIMATHBENCH, a human-verified benchmark designed to evaluate mathematical theorem proving, curated using an AI-powered human-assisted pipeline for formalizing natural language problems in Lean. INDIMATHBENCH is composed of 312 formal Lean 4 theorems paired with their corresponding informal problem statements, sourced from Indian Mathematics Olympiads. Through category-based retrieval, iterative compiler feedback, and multi-model ensembles, our pipeline generates candidate formalizations that experts efficiently validate via an interactive dashboard with automated quality summaries. Evaluation across multiple frontier models demonstrates that autoformalization remains challenging, with substantial gaps between syntactic validity and semantic correctness, while theorem proving success rates remain low even with iterative refinement, demonstrating that INDIMATHBENCH presents a challenging testbed for mathematical reasoning.

IndiMathBench: Autoformalizing Mathematical Reasoning Problems with a Human Touch

Reinforcement learning (RL) commonly relies on scalar rewards with limited ability to express temporal, conditional, or safety-critical goals, and can lead to reward hacking. Temporal logic expressible via the more general class of $\omega$-regular objectives addresses this by precisely specifying rich behavioural properties. We address both limitations simulatenously by combining $\omega$-regular objectives with explicit constraints, allowing safety requirements and optimisation targets to be treated separately. We develop a model-based RL algorithm based on linear programming, which in the limit produces a policy maximizing the probability of satisfying an $\omega$-regular objective while also adhering to $\omega$-regular constraints within specified thresholds. Furthermore, we establish a translation to constrained limit-average problems with optimality-preserving guarantees.

Reinforcement Learning with $\omega$-Regular Objectives and Constraints

Clinical and regulatory discussions about trustworthy therapeutic AI speak in ethical and legal terms, while technical work reports performance through AUC, F1, PPV, or survival concordance. The mapping between these numerical summaries and the social norms they support is implicit and often incoherent. This gap is acute for therapeutic AI systems built on spatially resolved data---such as digital pathology, radiology, and spatial omics---where models guide target selection, biomarker discovery, and responder prediction. We provide a technical--normative analysis of common evaluation metrics. We formalize metric families by their aggregation operations: expectation-based (expected loss; AUC as a U-statistic over pairs), quantile/tail-based (median, upper quantiles, CVaR), supremum-type (worst-group risk, minimax regret), thresholded confusion-matrix ratios (PPV/precision, sensitivity, F1), and ranking metrics (top- 
, average precision). For each family we identify the implicit social norm: maximizing average benefit, protecting typical patients, guarding against worst-case harms, or prioritizing top-predicted benefit. We prove an incompatibility result showing that high AUC can coexist with very low worst-group sensitivity under deployment-relevant thresholding, and we illustrate further incompatibilities via clinical examples. We then propose a metric design framework: (i) explicit normative declarations, (ii) multi-objective evaluation with subgroup and tail-risk constraints, and (iii) deployment checklists tying thresholds to institutional responsibilities. The goal is not to replace ethical debate with formulas, but to make explicit and auditable the value judgments encoded by metric choices.

From AUC to Accountability: Metric Choices, Social Norms, and the Deployment of Therapeutic AI

Immune checkpoint inhibitors (ICIs) have led to paradigm shifts in the treatment of several tumour types, yet microsatellite stable (MSS) colon cancer remains a major challenge and currently has no approvals for ICI-based therapies. Nevertheless, recent neoadjuvant trials introducing ICIs to early-stage MSS colon tumours have reported encouraging findings. To further improve ICI efficacy on this subset, it is first necessary to mechanistically understand how ICIs function in this otherwise immune-cold disease.\n\nPrevious studies profiling pre- and post-treatment ICI colorectal tumors (e.g., Feng et al., Nat Commun 2024) are compromised by significant sampling bias in post-treatment samples: our analysis reveals that responders are primarily composed of normal adjacent stroma while non-responders are predominantly tumor core. To overcome this limitation and characterize the effect of treatment in action, we utilized paired pre- and on-treatment samples from a window-of-opportunity study administering pembrolizumab (anti-PD-1) with XELOX chemotherapy in non-metastatic MSS colon cancer. Pre-treatment samples were composed of 3-4 biopsy pieces, while on-treatment samples were obtained from surgical resection following two cycles of therapy. To achieve a comprehensive spatial overview, large pieces from on-treatment samples (1 cm x 2 cm, encompassing both tumor nest and peripheral stroma) were selected for profiling. Using high-resolution spatial transcriptomics (10x Genomics Xenium), we profiled both pre- and on-treatment samples, generating ~17 million cells across 7 patients with detailed cell-state annotations.\n\nAs this is a window-of-opportunity study, clinical response evaluation was not the primary endpoint. Instead, to approximate treatment effect, we focused on CD8+ T-cell infiltration as a key indicator of immune modulation. Consistent with the immune-cold phenotype of MSS disease, only a subset of cases (2/7) showed increased intratumoural CD8+ effector T-cell infiltration during treatment. In contrast, most tumours (5/7) exhibited increased CD8+ pre-effector T cells restricted to peripheral stroma, showing an immune-excluded phenotype.\n\nTo link on-treatment CD8+ T-cell localization to local molecular signals, we developed linear regression models that predict CD8+ T-cell spatial distribution from secreted-factor expression. Distinct sets of secreted factors were found to be spatially associated with the two CD8+ T-cell populations: factors associated with infiltrating effector T cells included chemokines like CXCL9, CXCL10, and CXCL11 among others, whereas factors associated with stroma-restricted pre-effector T cells included CXCL12 and a broader collection of suppressive programmes. Our analysis identified specific fibroblast and macrophage states as the source of these secreted factors, suggesting distinct infiltration- and stroma-restricted immune neighborhoods. Furthermore, the infiltration- and exclusion-associated programmes were recapitulated in independent validation datasets, including on-treatment breast cancer samples from a window-of-opportunity trial (n=40; Bassez et al., Nat Med 2021) and treatment-naïve primary colorectal tumours.\n\nOverall, our study provides a comprehensive, high-resolution spatial view into how neoadjuvant chemo-immunotherapy remodels the MSS tumour microenvironment. Notably, key findings, including pre-effector T cells and their associated secreted programs, were predominantly detected at the peripheral stroma, highlighting the critical importance of characterizing this often-overlooked region. By revealing distinct spatial molecular programs that govern T-cell infiltration versus stroma-restriction, our work identifies potential therapeutic strategies to overcome immune exclusion in MSS colon cancer. 


Spatial profiling reveals molecular determinants of CD8+ dynamics during chemo-immunotherapy in MSS colon cancer

We present an improved methodology for gene expression imputation (GEI) in Xenium, a\ntargeted spatial transcriptomics platform. Xenium provides single-cell resolution but measures\nonly a limited set of genes, restricting downstream analyses. Our approach of aligning spatial\nand reference scRNA-seq data through “Symphony” in a common embedding space, integrating\nthem using single-cell neighborhood information improved recovery of genome-wide expression\nprofiles from targeted panels, enabling more accurate downstream analysis 


Expanding Gene Expression profile in Targeted Spatial Transcriptomics through scRNA-seq data integration

The integration of spatial multi-omics data from single tissues is crucial for advancing biological research. However, a significant data imbalance impedes progress: while spatial transcriptomics data is relatively abundant, spatial proteomics data remains scarce due to technical limitations and high costs. To overcome this challenge we propose STProtein, a novel framework leveraging graph neural networks with multi-task learning strategy. STProtein is designed to accurately predict unknown spatial protein expression using more accessible spatial multi-omics data, such as spatial transcriptomics. We believe that STProtein can effectively addresses the scarcity of spatial proteomics, accelerating the integration of spatial multi-omics and potentially catalyzing transformative breakthroughs in life sciences. This tool enables scientists to accelerate discovery by identifying complex and previously hidden spatial patterns of proteins within tissues, uncovering novel relationships between different marker genes, and exploring the biological "Dark Matter". 


STProtein: predicting spatial protein expression from multi-omics data

Spatial transcriptomics is an emerging technology to study gene expression and cell type interactions at spatial resolution. To assess which spatial transcriptomics platform is more suitable for studying brain samples, we performed a comparison between Visium HD and Xenium (including 5k and brain panel) on a human hypothalamus sample. Xenium brain panel and 5k datasets showed better sensitivity for transcript detection compared to Visium HD, similar to the observations from other studies. Xenium data also showed higher expression level for cell type markers and key genes involved in appetite regulation in the hypothalamus. In addition, we observed strong correlation in cell type expression between the Xenium brain panel and 5k data. Further benchmarking against relevant snRNA-seq data showed moderate correlation for cell type expression. Surprisingly, we found comparable specificity for known cell type markers for both neuronal and non-neuronal cell types relative to snRNA-seq data, albeit substantial challenges in cell segmentation based on Xenium data. However, discrepancies were observed between cell type label transfer from relevant snRNA-seq data and manual annotation.\n\nIn a pilot study, we utilized Xenium platform to profile spatial transcriptomics on mouse hypothalamus and hindbrain, in exploration of sex and age associated gene expression differences. In total 247 genes were assayed from the 10x brain panel, supplemented by 100 custom genes including known cell type markers and genes of interest. We applied Cellpose on 18s rRNA and DAPI staining for cell segmentation, which yielded 80,000 and 200,000 cells from hindbrain and hypothalamus, respectively, across 8 mice after quality control. With scRNA-seq approaches for downstream clustering and annotation, we were able to identify major cell types including neurons, glial cells and vascular cells. Cell type compositional and neighbourhood analysis highlighted differences in cell type abundance and neighbourhood enrichment between young mice and old mice. Differential gene expression analysis uncovered more pronounced differences associated with age as opposed to sex. In summary, our analysis demonstrated that spatial transcriptomics is a promising tool to decipher biology in the brain. 


Explorations of spatial transcriptomics on brain samples for biological understanding

Understanding how gene expression evolves over time after trauma is central to modeling immune responses, yet single-cell temporal data remain sparse and heterogeneous across cell types. Using a temporal trauma scRNA-seq dataset, we formulate the task of predicting next-time gene expression from earlier observations under a cross-cell-type generalization setting. \nWe introduce the Dynamic Consistency Index (DCI), which quantifies how consistently a gene’s temporal trajectory aligns across cell types, serving as a measure of biological regularity and predictability. High-DCI genes exhibit reproducible temporal dynamics and are markedly easier to model. \nBy integrating DCI-based gene selection with a recurrent neural architecture trained under a Gaussian negative log-likelihood objective, we achieve superior accuracy and well-calibrated uncertainty compared to deterministic baselines. \nOverall, DCI reliably identifies dynamically consistent genes, and uncertainty-aware recurrent modeling provides a robust framework for capturing cross-cell-type gene-expression evolution. 


Premium content

Next from AAAI 2026

Beyond Verification: Abductive Explanations for Post-AI Assessment of Privacy Leakage

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES