Singapore

Accurate prediction of compound protein interactions (CPIs) is crucial for drug discovery. 
However, existing deep learning-based methods suffer from hidden biases and poor cross-domain generalization, leading to spurious correlations and inadequate representation of unseen compound-protein pairs. 
To address these limitations, we propose FuseMine, a multimodal deep learning framework that jointly leverages molecular structures and biological sequences for reliable CPI prediction.
Specifically, FuseMine adopting a dual-representation strategy for each molecule. It employs a convolutional encoder to capture structural features, combined with pretrained large language models for extracting semantic information from sequences. We propose a novel Multi-modal Feature Orchestration Aggregation (MFOA) module that enables deep and synergistic fusion between the structural features and the sequential semantics of molecules, effectively capturing the complementary patterns across modalities. Additionally, we design a Reduction Differential Feature Mining (RDFM) module to further enhance the representation of discriminative features, thereby improving the model’s generalization capability. Extensive experiments on multiple benchmark datasets demonstrate that our framework consistently outperforms state-of-the-art methods in both intra-domain and cross-domain scenarios. These results highlight the synergistic value of combining structural and sequential data for CPIs. Code is available at https://anonymous.4open.science/r/FuseMine.

AAAI 2026

FuseMine: Robust Multi-Modal Compound-Protein Interaction Prediction via Differential Attention Feature Mining

compound-protein interaction prediction

app: natural sciences

ml: applications

drug discovery

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Frequency Modulated Continuous Wave (FMCW) radars can measure subtle chest wall oscillations to enable non-contact heartbeat sensing. However, traditional radar-based heartbeat sensing methods face performance degradation due to noise. Learning-based radar methods achieve better noise robustness but require costly labeled signals for supervised training. To overcome these limitations, we propose the first unsupervised framework for radar-based heartbeat sensing via Augmented Pseudo-Label and Noise Contrast (Radar-APLANC). We propose to use both the heartbeat range and noise range within the radar range matrix to construct the positive and negative samples, respectively, for improved noise robustness. Our Noise-Contrastive Triplet (NCT) loss only utilizes positive samples, negative samples, and pseudo-label signals generated by the traditional radar method, thereby avoiding dependence on expensive ground-truth physiological signals. We further design a pseudo-label augmentation approach featuring adaptive noise-aware label selection to improve pseudo-label signal quality. Extensive experiments on the Equipleth dataset and our collected radar dataset demonstrate that our unsupervised method achieves performance comparable to state-of-the-art supervised methods.

Radar-APLANC: Unsupervised Radar-based Heartbeat Sensing via Augmented Pseudo-Label and Noise Contrast

User interface (UI) design is an iterative process in which designers progressively refine their work with design software such as Figma or Sketch. Recent advances in vision–language models (VLMs) with tool invocation suggest these models can operate the design software to edit a UI design through iteration. Understanding and enhancing this capacity is important, as it highlights VLMs’ potential to collaborate with designers within conventional software. However, as no existing benchmark evaluates the tool-based design performance, the capacity remains unknown. To address this, we introduce CANVAS, a benchmark for VLMs on tool-based user interface design. Our benchmark contains 598 tool-based design tasks paired with ground-truth references sampled from 3.3K mobile UI designs across 30 function-based categories (e.g., onboarding, messaging). In each task, a VLM updates the design step-by-step, through context-based tool invocations (e.g., create a rectangle as a button background), linked to design software. Specifically, CANVAS incorporates two task types: (i) design replication evaluates the ability to reproduce a whole UI screen; (ii) design modification evaluates the ability to modify a specific part of an existing screen. Results suggest that leading models exhibit more strategic tool invocations, improving design quality. Furthermore, we identify common error patterns models exhibit, guiding future work in enhancing tool-based design capabilities.

CANVAS: A Benchmark for Vision-Language Models on Tool-Based User Interface Design

Text-attributed graphs, where nodes are enriched with textual attributes, have become a powerful tool for modeling real-world networks such as citation, social, and transaction networks. However, existing methods for learning from these graphs often assume that the distributions of training and testing data are consistent. This assumption leads to significant performance degradation when faced with out-of-distribution (OOD) data. In this paper, we address the challenge of node-level OOD detection in text-attributed graphs, with the goal of maintaining accurate node classification while simultaneously identifying OOD nodes. We propose a novel approach, LLM-Enhanced Energy Contrastive Learning for Out-of-Distribution Detection in Text-Attributed Graphs (LECT), which integrates large language models (LLMs) and energy-based contrastive learning. The proposed method involves generating high-quality OOD samples by leveraging the semantic understanding and contextual knowledge of LLMs to create dependency-aware pseudo-OOD nodes, and applying contrastive learning based on energy functions to distinguish between in-distribution (IND) and OOD nodes. The effectiveness of our method is demonstrated through extensive experiments on six benchmark datasets, where our method consistently outperforms state-of-the-art baselines, achieving both high classification accuracy and robust OOD detection capabilities.

LLM-Enhanced Energy Contrastive Learning for Out-of-Distribution Detection in Text-Attributed Graphs

Recent unsupervised domain adaptation (UDA) methods have shown great success in addressing classical domain shifts (e.g., synthetic-to-real), but they still suffer under complex shifts (e.g. geographical shift), where both the background and object appearances differ significantly across domains. Prior works showed that the language modality can help in the adaptation process, exhibiting more robustness to such complex shifts. In this paper, we introduce TRUST, a novel UDA approach that exploits the robustness of the language modality to guide the adaptation of a vision model. TRUST generates pseudo-labels for target samples from their captions and introduces a novel uncertainty estimation strategy that uses normalised CLIP similarity scores to estimate the uncertainty of the generated pseudo-labels. Such estimated uncertainty is then used to reweight the classification loss, mitigating the adverse effects of wrong pseudo-labels obtained from low-quality captions. To further increase the robustness of the vision model, we propose a multimodal soft-contrastive learning loss that aligns the vision and language feature spaces, by leveraging captions to guide the contrastive training of the vision model on target images. In our contrastive loss, each pair of images acts as both a positive and a negative pair and their feature representations are attracted and repulsed with a strength proportional to the similarity of their captions. This solution avoids the need for hardly determining positive and negative pairs, which is critical in the UDA setting. Our approach outperforms previous methods, setting the new state-of-the-art on classical (DomainNet) and complex (GeoNet) domain shifts.

TRUST: Leveraging Text Robustness for Unsupervised Domain Adaptation

In multi-view clustering (MVC), complementary and consistent information from multiple views is integrated to improve clustering performance. However, inter-view sample correspondences may be partially missing in practice, making it difficult to learn cross-view consistency, which leads to the partially view-aligned problem (PVP). Most existing partially view-aligned clustering (PVC) methods first learn cross-view consistent representations based on known alignments, and then recover missing correspondences by measuring cross-view similarity between samples. However, such an indirect alignment recovery process depends on high-quality consistent representations and lacks effective utilization of known alignments, often resulting in sub-optimal outcomes. To address this, we propose a novel direct alignment recovery perspective, instantiated as K-Nearest Neighbors Direct Alignment (KNNDA). Specifically, we first construct an alignment domain by mapping the aligned neighbors of each unaligned sample into the aligned view. Then, we compute alignment confidence based on the similarity between known aligned pairs of neighbors. In particular, we use a dynamic threshold to filter out unreliable alignments. Finally, new alignments are generated within the high-confidence alignment domain. Contrastive loss is used to learn consistent representations for clustering. Comprehensive experiments on several real-world datasets show the effectiveness and superiority of our module in partially view-aligned clustering.

KNNDA: A New Perspective of Alignment Recovery for Partially View-Aligned Clustering

Accurate vessel segmentation in X-ray angiograms is crucial for numerous clinical applications. However, the scarcity of annotated data presents a significant challenge, which has driven the adoption of self-supervised learning (SSL) methods such as masked image modeling (MIM) to leverage large-scale unlabeled data for learning transferable representations. Unfortunately, conventional MIM often fails to capture vascular anatomy because of the severe class imbalance between vessel and background pixels, leading to weak vascular representations. To address this, we introduce Vascular anatomy-aware Masked Image Modeling (VasoMIM), a novel MIM framework tailored for X-ray angiograms that explicitly integrates anatomical knowledge into the pre-training process. Specifically, it comprises two complementary components: anatomy-guided masking strategy and anatomical consistency loss. The former preferentially masks vessel-containing patches to focus the model on reconstructing vessel-relevant regions. The latter enforces consistency in vascular semantics between the original and reconstructed images, thereby improving the discriminability of vascular representations. Empirically, VasoMIM achieves state-of-the-art performance across three datasets. These findings highlight its potential to facilitate X-ray angiogram analysis.

VasoMIM: Vascular Anatomy-Aware Masked Image Modeling for Vessel Segmentation

Direct Preference Optimization (DPO) simplifies reinforcement learning from human feedback (RLHF) for large language models (LLMs) by directly training on offline preference data to align with human preferences. During DPO training, the reference model serves as a data weight adjuster. However, the common practice of initializing the policy and reference models identically in DPO can lead to inefficient data utilization and impose a performance ceiling. Meanwhile, the absence of a reference model in Simple Preference Optimization (SimPO) reduces training robustness and requires stricter conditions to prevent catastrophic forgetting. In this work, we propose Pre-DPO, a simple yet effective DPO-based training paradigm that improves preference optimization by introducing a guiding reference model. This reference model provides foresight into the desired policy state achievable through the training preference data, serving as a guiding mechanism that adaptively assigns higher weights to samples more suitable for the model and lower weights to those less suitable. Extensive experiments on the AlpacaEval 2 and Arena-Hard v0.1 benchmarks demonstrate that Pre-DPO consistently improves the performance of both DPO and SimPO, without relying on external models or additional data. Our code, data, and technical appendix can be found in the Supplementary Material.

Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model

Partial Label Learning (PLL) aims to train multi-class classifiers from examples where each instance is associated with a set of candidate labels, among which the ground-truth label is assumed to be included. While most existing studies assume that partial labels are both instance-independent and reliable, such assumptions often break down in real-world scenarios, where candidate sets may depend on instance-specific features and even exclude the ground-truth label. In this work, we investigate a more realistic setting termed Unreliable Instance-Dependent Partial Label Learning (UIDPLL). To address the challenges in UIDPLL, we propose a novel framework named Neighborhood-guided Label Augmentation and Pruning (NLAP). NLAP exploits the structural consistency among neighboring instances to progressively refine candidate label sets and integrates classifier feedback to disambiguate labels during training. This progressive mechanism improves classification performance by tackling ambiguity caused by noise and instance dependency in partial labels. Furthermore, we provide theoretical guarantees for the proposed NLAP framework, demonstrating that label ambiguity can be effectively reduced through appropriate refinement and pruning procedures. Extensive experiments on both benchmark and real-world datasets demonstrate the robustness and effectiveness of the proposed method.

Neighbor-aware Label Refinement: Enhancing Unreliable Instance-Dependent Partial Labels

Despite the remarkable success of Large Language Models (LLMs), evaluating their outputs' quality regarding preference remains a critical challenge. While existing works usually leverage a strong LLM as the judge for comparing LLMs' response pairwisely, such a single-evaluator approach is vulnerable to cyclic preference, i.e., output A is better than B, B than C, but C is better than A, causing contradictory evaluation results. To address this, we introduce PGED (Preference Graph Ensemble and Denoise), a novel approach that leverages multiple model-based evaluators to construct preference graphs, and then ensembles and denoises these graphs for acyclic, non-contradictory evaluation results. We provide theoretical guarantees for our framework, demonstrating its efficacy in recovering the ground truth preference structure. Extensive experiments on ten benchmarks demonstrate PGED 's superiority in three applications: 1) model ranking for evaluation, 2) response selection for test-time scaling, and 3) data selection for model fine-tuning. Notably, PGED combines small LLM evaluators (e.g., Llama3-8B, Mistral-7B, Qwen2-7B) to outperform strong ones (e.g., Qwen2-72B), showcasing its effectiveness in enhancing evaluation reliability and improving model performance.

Towards Acyclic Preference Evaluation of Language Models via Multiple Evaluators

Molecular structure generation from mass spectrometry is fundamental for understanding cellular metabolism and discovering novel compounds. Although tandem mass spectrometry (MS/MS) enables the high-throughput acquisition of fragment fingerprints, these spectra often reflect higher-order interactions involving the concerted cleavage of multiple atoms and bonds-crucial for resolving complex isomers and non-local fragmentation mechanisms.
However, most existing methods adopt atom-centric and pairwise interaction modeling, overlooking higher-order edge interactions and lacking the capacity to systematically capture essential many-body characteristics for structure generation.
To overcome these limitations, we present MBGen, a Many-Body enhanced diffusion framework for de novo molecular Generation from mass spectra.
By integrating a novel many-body attention mechanism and higher-order edge modeling, MBGen comprehensively leverages the rich structural information encoded in MS/MS spectra, enabling accurate de novo generation and isomer differentiation for novel molecules. 
Experimental results on the NPLIB1 and MassSpecGym benchmarks demonstrate that MBGen achieves superior performance, with improvements of up to 230% over state-of-the-art methods, highlighting the scientific value and practical utility of many-body modeling for mass spectrometry-based molecular generation. Further analysis and ablation studies show that our approach effectively captures higher-order interactions and exhibits enhanced sensitivity to complex isomeric and non-local fragmentation information.

Content not yet available

Next from AAAI 2026

Radar-APLANC: Unsupervised Radar-based Heartbeat Sensing via Augmented Pseudo-Label and Noise Contrast

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES