Singapore

Multi-view clustering (MVC) has recently garnered increasing attention for its ability to partition unlabeled samples into distinct clusters by leveraging complementary and consistent information from different views. Existing MVC methods primarily combine deep neural networks with contrastive learning for cross-view representation learning, yet often overlook the inherent global-local structural relationships among samples. While GNN-based methods capture local structures, they struggle to model global dependencies, leading to inferior inter-cluster separability. In contrast, Transformer-based methods excel at global aggregation but suffer from quadratic complexity, and their attention smoothing effect weakens fine-grained local structures, resulting in suboptimal intra-cluster compactness. To address these limitations, we propose a novel end-to-end MVC framework called Mamba-Driven Multi-View Discriminative Clustering via Global-Local Cross-View Sequence Modeling (MGLC). By flexibly constructing multi-view sequences, MGLC fully exploits the efficient sequence modeling capabilities of Mamba to jointly model cross-view dependencies and global-local structural relationships among samples. Furthermore, MGLC introduces a Cross-Mamba Fusion module to dynamically integrate cross-view and global-local structural representations. Additionally, MGLC incorporates a Dual Calibration Contrastive Learning module, guided by high-confidence pseudo-labels, that adaptively refines both feature and semantic representations while mitigating false negatives among semantically similar samples. Extensive comparative experiments and ablation studies demonstrate the effectiveness of MGLC.

AAAI 2026

Mamba-Driven Multi-View Discriminative Clustering via Global-Local Cross-View Sequence Modeling

global-local structure

mamba

state space model

ml: clustering

sequence modeling

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Structure-Based Drug Design (SBDD) has emerged as a popular approach in drug discovery, leveraging three-dimensional protein structures to generate drug ligands. However, existing generative models encounter several key challenges: (1) Incorporating boundary condition constraints, (2) Integrating hierarchical structural conditions and (3) Ensuring spatial modeling fidelity. To overcome these limitations, we propose SculptDrug, a spatial condition-aware generative model based on Bayesian Flow Networks (BFNs). 
First, SculptDrug follows a BFNs-based framework and employs a progressive denoising strategy to ensure spatial modeling fidelity, iteratively refining atom positions while enhancing local interactions for precise spatial alignment.
Second, we introduce the Boundary Awareness Block, which incorporates protein surface constraints into the generative process to ensure that the generated ligands are geometrically compatible with the target protein.
Finally, we design a Hierarchical Encoder that captures global structural context while preserving fine-grained molecular interactions, ensuring overall consistency and accurate ligand–protein conformations.
We evaluate SculptDrug on the CrossDocked dataset, and experimental results demonstrate that SculptDrug outperforms state-of-the-art baselines, proving the efficacy of spatial condition-aware modeling.

SculptDrug: A Spatial Condition-Aware Bayesian Flow Model for Structure-based Drug Design

Space computing devices expand handwritten input from two-dimensional screens into three-dimensional space, providing an unrestricted interactive experience. Due to the high degree of freedom and lack of tactile feedback in in-air handwriting, handwritten characters not only become less legible but also lose the writer's personal style. This paper proposes a method for reconstructing discrete in-air handwriting using continuous diffusion models, capturing the writing process and style from a small number of user-provided handwritten tracks and images, to restore the legibility of characters and mimics the writer's style. We represent handwritten track data in binary form and model it with continuous diffusion models, recovering discrete handwritten track data through threshold processing. Our approach reconstructs in-air handwritten characters in two stages. During the content preservation phase, we propose a partial noise injection strategy based on reference sequence modeling, using the content information of the original character as a guiding condition to maintain content consistency in handwritten character. In the style aggregation phase, we adaptively fuse the visual style of the handwritten in the image modality with the dynamic writing process in the sequence modality, overcoming issues of insufficient style capture due to noise interference in the backward process. Qualitative and quantitative experiments demonstrate the superiority of our method.

Gracefully Air-Written: Enhancing the Legibility and Style Consistency of In-Air Handwriting

Dynamic brain networks provide a powerful representation for capturing temporal variations in functional brain connectivity and have gained increasing attention in brain disease diagnosis. However, most existing methods extract features from isolated time windows, making it difficult to capture the high-order dynamic evolution of brain activity. Moreover, these methods often neglect the functional heterogeneity among brain regions, thereby limiting diagnostic performance. To address these limitations, we propose **HyperDiag**, a novel temporal-regional **Hyper**graph learning via topology-enhanced state propagation for brain disease **Diag**nosis. Specifically, we first design a dual-level hypergraph learning strategy: a temporally-evolving hypergraph message passing strategy to capture dynamic high-order dependencies within and across time windows, and meanwhile, a region-wise functional hypergraph learning strategy to capture regional dependencies. Subsequently, we construct a topology-enhanced selective state-space propagation network to integrate complementary information from both the temporally-evolving and region-wise features. Extensive experiments on four brain disorder datasets (ABIDE-I, ADNI, REST-meta-MDD, and Epilepsy) demonstrate that HyperDiag not only outperforms state-of-the-art methods but also identifies biologically meaningful abnormal connections, offering potential biomarkers for clinical interpretation. Our code is available in the Appendix.

HyperDiag: Temporal–Regional Hypergraph Learning via Topology-Enhanced State Propagation for Brain Disease Diagnosis

The rapid advancement of generative models demands robust detection of synthetic images. Existing foundation model-based methods fall into either perceptual or generative paradigms, each with inherent limitations: perceptual models capture high-level semantics but miss subtle artifacts; generative models highlight fine-grained flaws yet overlook semantic inconsistency. We fundamentally rethink foundation model application in AI-generated image detection by proposing **SynerDetect**, a hierarchical synergistic framework that enables deep cross-paradigm integration, establishing a unified forensic paradigm. SynerDetect introduces Cross-Model Interactive Distillation (CMID) to infuse generative forensic signals into perceptual encoders via prompt-guided reconstruction, while discriminative latent noise features are extracted through model inversion. Further, Optimal Transport-Guided Discriminative Contrastive Learning (OT-DCL) structurally aligns and integrates heterogeneous forensic representations, consolidating them into a robust, unified detection space. SynerDetect sets a new performance paradigm on AIGCDetectBenchmark and GenImage, and attains **5.20\%** accuracy gains on the challenging Chameleon benchmark, whose synthetic images consistently pass the Visual Turing Test. These results demonstrate robust, real-world generalization enabled by our unified cross-paradigm framework.

SynerDetect: Hierarchical Synergistic Learning for Generalizable AI-Generated Image Detection

Large Language Model (LLM) agent systems have advanced rapidly, driven by their strong generalization in zero-shot settings. To further enhance reasoning and accuracy on complex tasks, Multi-Agent Debate (MAD) has emerged as a promising framework that engages multiple LLM agents in structured debates to encourage diverse reasoning. However, triggering MAD for every input instance is inefficient, as it incurs substantial computational (token) cost and may even degrade accuracy by overturning correct single-agent answers. To address these limitations, we propose intelligent Multi-Agent Debate (iMAD), a token-efficient framework that selectively triggers MAD only when it is likely to be beneficial (i.e., correcting an initially wrong answer) in the zero-shot setting. To achieve this goal, iMAD learns generalizable model behaviors to make accurate debate decisions in the zero-shot setting. Specifically, it first prompts a single agent to produce a structured self-critique response, from which we extract over 40 interpretable linguistic and semantic features capturing hesitation cues. A lightweight classifier, based on Multi-Layer Perceptron and trained using our proposed FocusCal loss, then determines whether to trigger MAD, enabling robust zero-shot decisions without dataset-specific tuning. We evaluate iMAD on six (visual) question answering datasets against five competitive baselines. iMAD significantly reduces token usage (by up to 92%) while also improving final answer accuracy (by up to 13.5%).

iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference

Conditional molecular generation, aiming to generate 2D & 3D molecules that satisfy given properties, has achieved remarkable progress, thanks to the advances in deep generative models such as graph diffusion. However, existing methods generally assume that the given conditions for training and testing are consistent, failing to handle the realistic challenge when there exist distribution shifts between training and testing conditions. Invariant learning is a mainstream paradigm for addressing distribution shifts, but fusing invariant learning principles with conditional molecular generation faces three core challenges: (1) existing invariant learning methods focus on discriminative tasks and cannot be directly adapted to molecule generative tasks; (2) how to distinguish between invariant subgraph and variant subgraph of a molecule graph, which is treated as an integrated input; (3) how to fuse invariant subgraphs, variant subgraphs, and property conditions for effective generation. To tackle these challenges, we propose $\underline{I}$nvariant $\underline{C}$onditional $\underline{MOL}$ecular generation (IC-MOL), a framework that combines invariant learning with graph diffusion to improve the generalization ability of conditional molecular generation under distribution shifts. Specifically, we first disentangle molecular graphs into invariant and variant subgraphs while maintaining SE(3) equivariance, an important inductive bias for molecular generation. On this basis, we further design a two-phase graph diffusion generation model. In the first phase, we generate an invariant molecular consistent with the target property. In the second phase, we propose a cross-attention mechanism to fuse variant subgraph representations and property conditions to guide the generation of complete molecules while maintaining property alignment. Extensive experiments on the benchmark dataset show that IC-MOL consistently outperforms state-of-the-art baselines across six property conditions under distribution shifts.

Invariant Conditional Molecular Generation Under Distribution Shift

Many sequential decision-making problems can be formulated as shortest-path problems, where the objective is to reach a goal state from a given starting state. Heuristic search is a standard approach for solving such problems, relying on a heuristic function to estimate the cost to the goal from any given state. Recent approaches leverage reinforcement learning to learn heuristics by applying deep approximate value iteration. These methods typically rely on single-step Bellman updates, where the heuristic of a state is updated based on its best neighbor and the corresponding edge cost. This work proposes a generalized approach that enhances both state sampling and heuristic updates by performing limited-horizon searches and updating each state's heuristic based on the shortest path to the search frontier, incorporating both edge costs and the heuristic values of frontier states.

Beyond Single-Step Updates: Reinforcement Learning of Heuristics with Limited-Horizon Search

This work introduces Text-based Aerial-Ground Person Retrieval (TAG-PR), which aims to retrieve person images from heterogeneous aerial and ground views with textual descriptions. 
Unlike traditional Text-based Person Retrieval (T-PR), which focuses solely on ground-view images, TAG-PR introduces greater practical significance and presents unique challenges due to the large viewpoint discrepancy across images. 
To support this task, we contribute: (1) TAG-PEDES dataset, constructed from public benchmarks with automatically generated textual descriptions, enhanced by a diversified text generation paradigm to ensure robustness under view heterogeneity; and (2) TAG-CLIP, a novel retrieval framework that addresses view heterogeneity through a hierarchically-routed mixture of experts module to learn view-specific and view-agnostic features and a viewpoint decoupling strategy to decouple view-specific features for better cross-modal alignment.
We evaluate the effectiveness of TAG-CLIP on both the proposed TAG-PEDES dataset and existing T-PR benchmarks. The dataset and code will be publicly available.

Text-based Aerial-Ground Person Retrieval

The high memory demands of the Key-Value (KV) Cache during the inference of Large Language Models (LLMs) severely restrict their deployment in resource-constrained platforms. Quantization can effectively alleviate the memory pressure caused by KV Cache. However, existing methods either rely on static one-size-fits-all precision allocation or fail to dynamically prioritize critical KV in long-context tasks, forcing memory-accuracy-throughput tradeoffs. In this work, we propose a novel mixed-precision quantization method for KV Cache named KVmix. KVmix leverages gradient-based importance analysis to evaluate how individual Key and Value projection matrices affect the model loss, enabling layer-specific bit-width allocation for mix-precision quantization. It dynamically prioritizes higher precision for important layers while aggressively quantizing less influential ones, achieving a tunable balance between accuracy and efficiency. KVmix introduces a dynamic long-context optimization strategy that adaptively keeps full-precision KV pairs for recent pivotal tokens and compresses older ones, achieving high-quality sequence generation with low memory usage. Additionally, KVmix provides efficient low-bit quantization and CUDA kernels to optimize computational overhead. On LLMs such as Llama and Mistral, KVmix achieves near-lossless inference performance with extremely low quantization configuration (Key 2.19bit Value 2.38bit), while delivering a remarkable 4.9× memory compression and a 5.3× speedup in inference throughput.

KVmix: Gradient-Based Layer Importance-Aware Mixed-Precision Quantization for KV Cache

In warehouse-based e-commerce, accurate category-level warehouse demand prediction is essential to ensure effective inventory management. Existing works mainly explore advanced time series models to capture the temporal dynamics, failing to mine cross-category and cross-city correlations effectively. In this paper, we explore large language models as an opportunity to understand the semantic information of warehouses and categories to enhance prediction. However, it is not trivial due to: i) the inaccurate LLM’s understanding of the category-related and warehouse-related textual input; and ii) the cross-warehouse knowledge extraction. To solve the above challenges, we propose an LLM-guided multi-task graph learning framework, LMGL-WD, for category-level warehouse demand prediction. LMGL-WD includes three components: i) an LLM-guided category series encoding module to represent each category through contextual and series embedding; ii) a cross-warehouse category learning module to adaptively mine the informative knowledge from cross-warehouses to enhance category representation; and iii) a cross-category multi-task learning module to adaptively capture cross-category correlations to further improve prediction. Extensive evaluation results with real-world data collected from one of the largest e-commerce platforms in China demonstrate that LMGL-WD achieves superior performance, e.g., reduces MAPE by up to 31.59%, compared to
state-of-the-art methods.

Downloads

Next from AAAI 2026

SculptDrug: A Spatial Condition-Aware Bayesian Flow Model for Structure-based Drug Design

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

SculptDrug: A Spatial Condition-Aware Bayesian Flow Model for Structure-based Drug Design

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads