Singapore

Steering Vector (SV) is a powerful technique for controlling Large Language Models (LLMs) by manipulating their activations without altering model weights. However, when constructed from sensitive data, SV poses significant privacy risks, as it may leak private information. Existing differential privacy (DP) techniques for constructing SV cannot be directly applied to training-based SV construction paradigms, which offer higher task performance.
In this work, we present **PrivSV**, a general privacy-preserving approach for constructing SV with DP guarantees, compatible with arbitrary SV construction paradigms while maintaining high utility. In PrivSV, we propose three novel methods: a Layer-wise Noise-Resilient Reduction (LNR²) method to reduce the injected noise in high-dimensional SV; a Directional Prior Compensation (DPC) method to recover utility degraded by noise perturbation; and a Privacy-Aware Optimal Parameter Determination (POPD) method to adaptively maximize the performance of the final compensated SV. 
Extensive experiments on open-source LLMs of different families (i.e., LlaMa, Qwen, Mistral and Gemma) demonstrate that PrivSV outperforms several existing techniques across various privacy budgets.

AAAI 2026

PrivSV: Differentially Private Steering Vector for Large Language Models

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The success of deep learning is highly dependent on
large-scale labeled data. This presents a formidable
challenge in fields like molecular design and materials
science, where data annotation is prohibitively expensive.
Consequently, developing label-efficient learning methods
to maximize model performance under limited annotation
budgets has recently become more and more critical.

However, most of the current mainstream label-efficient
algorithms, like active learning and semi-supervised
learning, are primarily designed for Euclidean data, such
as images. They cannot effectively process the
non-Euclidean graph-structured data, thus overlooking the
rich topological information embedded within.

In this talk, we aim to bridge this gap through a
progressive research path that addresses three core
challenges in data annotation for graph-structured data.
First, to address the high cost of annotation, we adapt
active learning and semi-supervised learning from general
domains to explicit graph data, enabling the precise
labeling of high-value nodes. Second, to address label
scarcity, we pioneer methods to construct and leverage
implicit graph structures, propagating existing labels and
generating new information to boost the performance of
semi-supervised and self-supervised learning. Finally, to
address label noise, we perform the fusion of both explicit
and implicit graphs. By learning an implicit structure from
noisy explicit graph data, our methods will identify and
mitigate the impact of noise.

Graph-based Label-Efficient Learning: When Graph-Structured Data Meets Limited Labels

Diffusion Large Language Models (dLLMs) have recently emerged as a competitive non-autoregressive paradigm due to their unique training and inference approach. However, there is currently a lack of safety study on this novel architecture. In this paper, we present the first analysis of dLLMs' safety performance and propose a novel safety alignment method tailored to their unique generation characteristics. Specifically, we identify a critical asymmetry between the defender and attacker in terms of security. For the defender, we reveal that the middle tokens of the response, rather than the initial ones, are more critical to the overall safety of dLLM outputs; this seems to suggest that aligning middle tokens can be more beneficial to the defender. The attacker, on the contrary, may have limited power to manipulate middle tokens, as we find dLLMs have a strong tendency towards a sequential generation order in practice, forcing the attack to meet this distribution and diverting it from influencing the critical middle tokens. Building on this asymmetry, we introduce Middle-tOken Safety Alignment (MOSA), a novel method that directly aligns the model's middle generation with safe refusals exploiting reinforcement learning. We implement MOSA and compare its security performance against eight attack methods on two benchmarks. We also test the utility of MOSA-aligned dLLM on coding, math, and general reasoning. The results strongly prove the superiority of MOSA.

Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position

While large language models (LLMs) have shown promise in healthcare, their application for rare medical conditions is still hindered by scarce and unreliable datasets for fine-tuning. Hyperhidrosis, a disorder causing excessive sweating beyond physiological needs, is one such rare disorder, affecting 2–3\% of the population and significantly impacting both physical comfort and psychosocial well-being. To date, no work has tailored LLMs to advance the diagnosis or care of hyperhidrosis. To address this gap, we present LLM4Sweat, an open-source and domain-specific LLM framework for trustworthy and empathetic hyperhidrosis support. The system follows a three-stage pipeline. In the data augmentation stage, a frontier LLM generates medically plausible synthetic vignettes from curated open-source data to create a diverse and balanced question–answer dataset. In the fine-tuning stage, an open-source foundation model is fine-tuned on the dataset to provide diagnosis, personalized treatment recommendations, and empathetic psychological support. In the inference and expert evaluation stage, clinical and psychological specialists assess accuracy, appropriateness, and empathy, with validated responses iteratively enriching the dataset. Experiments show that LLM4Sweat outperforms baselines and delivers the first open-source LLM framework for hyperhidrosis, offering a generalizable approach for other rare diseases with similar data and trustworthiness challenges.

LLM4Sweat: A Trustworthy Large Language Model for Hyperhidrosis Support

We revisit the setting of fair allocation of indivisible items among agents with heterogeneous, non-monotone valuations. We explore the existence and efficient computation of allocations that approximately satisfy either envy-freeness or equity constraints. Approximate envy-freeness ensures that each agent values her bundle at least as much as those given to the others, after some (or any) item removal, while approximate equity guarantees roughly equal valuations among agents, under similar adjustments.
As a key technical contribution of this work, by leveraging fixed-point theorems (such as Sperner's Lemma and its variants), we establish the existence of *envy-free-up-to-one-good-and-one-chore* (EF1$^c_g$) and *equitable-up-to-one-good-and-one-chore* (EQ1$^c_g$) allocations, for non-monotone valuations that are always either non-negative or non-positive. These notions represent slight relaxations of the well-studied *envy-free-up-to-one-item* (EF1) and *equitable-up-to-one-item* (EQ1) guarantees, respectively.
Our existential results hold even when items are arranged in a path and bundles must form connected sub-paths. The case of non-positive valuations, in particular, has been solved by proving a novel multi-colouring variant of Sperner's Lemma that constitutes a combinatorial result of independent interest. In addition, we also design a polynomial-time dynamic programming algorithm that computes an EQ1$^c_g$ allocation. For monotone non-increasing valuations and path-connected bundles, all the above results can be extended to EF1 and EQ1 guarantees as well. 
Finally, we focus on the problem of finding *equitable-up-to-any-good-or-any-chore* (EQX$^c_g$) allocations, which relax the notion of *equitable-up-to-any-item* (EQX) guarantee and strengthen that of EQ1. For objective valuations, where items can be partitioned into either goods or chores, we show that such allocations always exist and can be efficiently computed.

Approximately Envy-free and Equitable Allocations of Indivisible Items for Non-monotone Valuations

Multi-modal image matching is a fundamental task in multi-view and multi-modal image processing. Its key challenge lies in extracting features that remain consistent despite drastic appearance variations across modalities. However, the learning of the feature is hindered by the scarcity and the inaccurate alignment of existing multi-modal datasets. To address this, we propose a knowledge distillation framework that transfers rich prior knowledge from large-scale unimodal tasks to enhance multi-modal representation learning. Specifically, semantic priors from a vision foundation model guide the feature extractor to identify shared semantic structures across modalities, enabling better generalization under large appearance gaps. In parallel, geometric priors derived from accurately aligned visible-light datasets improve detection precision on noisy aligned multi-modal pairs. Furthermore, we introduce a Heterogeneous Feature Aggregation (HFA) module to facilitate effective distillation and feature representation. Extensive experiments demonstrate that our method, SGPFeat, enhanced by Semantic and Geometric Priors, achieves state-of-the-art performance across diverse multi-modal image matching benchmarks.

SGPFeat: Semantic and Geometric Priors for Multi-modal Image Matching

Multi-view clustering aims to group data by integrating complementary information from multiple views. However, the inherent heterogeneity among views often leads to feature entanglement, severely limiting clustering performance. To address this challenge, we propose DC-SPAN—a Dual Contrastive Attention Network—grounded in a disentangle-then-fuse paradigm. DC-SPAN employs a dual-path variational architecture to explicitly decompose each view into shared and private latent subspaces. These representations are then robustly integrated via a Product-of-Experts (PoE) mechanism. At the heart of our model is a novel dual contrastive learning objective that simultaneously encourages alignment of shared components across views and enforces separation of private ones, enabling structured and disentangled representations. A cross-attention fusion module further adaptively aggregates these latent factors to yield a unified, discriminative embedding. The overall model is trained end-to-end using a composite loss function that incorporates reconstruction, orthogonality, and contrastive terms, along with a two-stage training scheme for improved stability. Extensive experiments on benchmark datasets demonstrate that DC-SPAN consistently outperforms existing state-of-the-art methods, highlighting its effectiveness and robustness in handling multi-view heterogeneity.

DC-SPAN: A Dual Contrastive Attention Network for Multi-View Clustering

Existing tool-augmented agentic systems are limited in the real world by (i) black-box reasoning steps that undermine trust of decision-making and pose safety risks, (ii) poor multimodal integration, which is inherently critical for healthcare tasks, and (iii) rigid and computationally inefficient agentic pipelines. We introduce **PASS** (**P**robabilistic **A**gentic **S**upernet **S**ampling), the *first* multimodal framework to address these challenges for Chest X-Ray (CXR) reasoning. PASS adaptively samples agentic workflows over a multi-tool graph, yielding decision paths annotated with interpretable probabilities. Given the complex CXR reasoning task with multimodal medical data, PASS leverages its learned task-conditioned distribution over the agentic supernet. Thus, it adaptively selects the most suitable tool at each supernet layer, offering probability-annotated trajectories for post-hoc audits and directly enhancing medical AI safety. PASS also continuously compresses salient findings into an evolving personalized memory, while dynamically deciding whether to deepen its reasoning path or invoke an early exit for efficiency. To optimize a Pareto frontier balancing performance and cost, we design a novel three-stage training procedure, including expert knowledge warm-up, contrastive path-ranking, and cost-aware reinforcement learning. To facilitate rigorous evaluation, we introduce `CAB-E`, a comprehensive benchmark for multi-step, safety-critical, free-form reasoning. Experiments across various benchmarks validate that PASS significantly outperforms strong baselines in multiple metrics (e.g., accuracy, AUC, LLM-J.) while balancing computational costs, pushing a new paradigm shift towards interpretable, adaptive, and multimodal medical agentic systems.

PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning

We study the infinite-horizon average-reward reinforcement learning (RL) for continuous space Lipschitz MDPs in which an agent can play policies from a given set $\Phi$. The proposed algorithms efficiently explore the policy space by ''zooming'' into the ''promising regions'' of $\Phi$, thereby achieving adaptivity gains in the performance. We upper bound their regret as $\tilde{\mathcal{O}}\big(T^{1 - d_{\text{eff.}}^{-1}}\big)$, where $d_{\text{eff.}} = d^\Phi_z+2$ for model-free algorithm $\textit{PZRL-MF}$ and $d_{\text{eff.}} = 2d_\mathcal{S} + d^\Phi_z + 3$ for model-based algorithm $\textit{PZRL-MB}$. Here, $d_\mathcal{S}$ is the dimension of the state space, and $d^\Phi_z$ is the zooming dimension given a set of policies $\Phi$. $d^\Phi_z$ depends on the underlying MDP as well as on $\Phi$. Hence, the proposed algorithms exhibit low regret in case the problem instance is benign and/or the agent competes against a low-complexity $\Phi$ (that has a small $d^\Phi_z$). When specialized to the case of finite-dimensional policy space, we obtain that $d_{\text{eff.}}$ scales as the dimension of this space under mild technical conditions; and also obtain $d_{\text{eff.}} = 0$, or equivalently $\tilde{\mathcal{O}}(\sqrt{T})$ regret for $\textit{PZRL-MF}$, under a curvature condition on the average reward function that is commonly used in the multi-armed bandit (MAB) literature.

Policy Zooming: Adaptive Discretization-based Infinite-Horizon Average-Reward Reinforcement Learning

With the increasing adoption of reinforcement learning with human feedback (RLHF) to align large language models (LLMs), the risk of backdoor installation during the alignment process has grown, potentially leading to unintended and harmful behaviors. Existing backdoor attacks mostly focus on simpler tasks, such as sequence classification, making them either difficult to install in LLM alignment or installable but easily detectable and removable. In this work, we introduce AdvBDGen, a generative fine-tuning framework that automatically creates prompt-specific paraphrases as triggers, enabling stealthier and more resilient backdoor attacks in LLM alignment. AdvBDGen is designed to exploit the disparities in learning speeds between strong and weak discriminators to craft backdoors that are both installable and stealthy. Using as little as 3% of the fine-tuning data, AdvBDGen can install highly effective backdoor triggers that, once installed, not only jailbreak LLMs during inference but also exhibit greater stability against input perturbations and improved robustness to trigger removal methods. Our findings highlight the growing vulnerability of LLM alignment pipelines to advanced backdoor attacks, underscoring the pressing need for more robust defense mechanisms.

AdvBDGen: A Robust Framework for Generating Adaptive and Stealthy Backdoors in LLM Alignment

Underground pipeline leaks and infiltrations pose significant threats to water security and environmental safety. Traditional manual inspection methods provide limited coverage and delayed response, often missing critical anomalies. This paper proposes AquaSentinel, a novel physics-informed AI system for real-time anomaly detection in urban underground water pipeline networks. We introduce four key innovations: (1) strategic sparse sensor deployment at high-centrality nodes combined with physics-based state augmentation to achieve network-wide observability from minimal infrastructure; (2) the RTCA (Real-Time Cumulative Anomaly) detection algorithm, which employs dual-threshold monitoring with adaptive statistics to distinguish transient fluctuations from genuine anomalies; (3) a Mixture of Experts (MoE) ensemble of spatiotemporal graph neural networks that provides robust predictions by dynamically weighting model contributions; (4) causal flow-based leak localization that traces anomalies upstream to identify source nodes and affected pipe segments. Our system strategically deploys sensors at critical network junctions and leverages physics-based modeling to propagate measurements to unmonitored nodes, creating virtual sensors that enhance data availability across the entire network. Experimental evaluation using 110 leak scenarios demonstrates that AquaSentinel achieves 100% detection accuracy. This work advances pipeline monitoring by demonstrating that physics-informed sparse sensing can match the performance of dense deployments at a fraction of the cost, providing a practical solution for aging urban infrastructure.

AquaSentinel: Next-Generation AI System Integrating Sensor Networks for Urban Underground Water Pipeline Anomaly Detection via Collaborative MoE-LLM Agent Architecture

Downloads

Next from AAAI 2026

Graph-based Label-Efficient Learning: When Graph-Structured Data Meets Limited Labels

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES