Singapore

Protein subcellular localization prediction is essential for understanding protein function and cellular organization. However, existing methods exhibit two major limitations: (1) they overlook the critical role of evolutionarily conserved protein domains, which are fundamental functional and structural units that significantly influence functions and subcellular localization, and (2) they rarely learn residue order and backbone coordinates simultaneously, neglecting the complementary information inherent in multi-modal representations. In this paper, we propose a novel Domain-Aware Multi-View Contrastive Representation Learning for Protein Subcellular Localization prediction, named DMVCL. Firstly, it devises domain-sequence/structure attention modules, which identify functionally significant regions in protein structures/sequences that critically determine subcellular localization. Secondly, it introduces a multi-view contrastive learning framework that unites inter-view and intra-view objectives. Inter-view contrastive learning aligns protein sequences with their corresponding structures by maximizing mutual information, thereby capturing the consistency of protein residue order and backbone coordinates. Intra-view contrastive learning enhances the model’s sensitivity to subtle sequence and structural differences by pushing apart the embeddings of proteins located in different cellular compartments while pulling closer those in the same compartment. Extensive experiments demonstrate that DMVCL significantly outperforms existing baselines. Ablation studies and visualizations further highlight the contributions of domain-sequence/structure attention and multi-view contrastive learning in achieving superior predictive performance. Source code can be found at https://anonymous.4open.science/r/DMVCL-C6F0.

AAAI 2026

Domain-Aware Multi-View Contrastive Representation Learning for Protein Subcellular Localization Prediction

protein subcellular localization prediction

multi-view contrastive learning

multi-modal representation

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Training data detection is critical for enforcing copyright and data licensing, as Large Language Models are trained on massive text corpora scraped from the internet. We present SPECTRA, a watermarking approach that makes training data reliably detectable even when it comprises less than 0.001 \% of the training corpus. SPECTRA works by using an LLM to generate semantically equivalent paraphrases of text, and then computing its token log probabilities, using a scoring model that was not trained on the text. A paraphrase is then sampled with a score computed using the token log probabilities that is close to the score of the original text. We compare the token log probabilities of a "suspect" model to those of the scoring model to detect if the watermarked data was used for training. We demonstrate that SPECTRA achieves a consistent p-value gap of over nine orders of magnitude when detecting data used to train a model versus data not used to train a model. SPECTRA equips data owners with a scalable, deploy‑before‑release watermark that survives even large‑scale LLM training.

Perturb Your Data: Paraphrase-Guided Training Data Watermarking

Clustering non-independent and identically distributed (non-IID) data under local differential privacy (LDP) in federated settings presents a critical challenge: preserving privacy while maintaining accuracy without iterative communication. 
Existing one-shot methods rely on unstable pairwise centroid distances or neighborhood rankings, degrading severely under strong LDP noise and data heterogeneity. 
We present Gravitational Federated Clustering (GFC), a novel approach to privacy-preserving federated clustering that overcomes the limitations of distance-based methods under varying LDP.
Addressing the critical challenge of clustering non-IID data with diverse privacy guarantees, GFC transforms privatized client centroids into a global gravitational potential field where true cluster centers emerge as topologically persistent singularities. 
Our framework introduces two key innovations: (1) a client-side compactness-aware perturbation mechanism that encodes local cluster geometry as "mass" values, and (2) a server-side topological aggregation phase that extracts stable centroids through persistent homology analysis of the potential field's superlevel sets. 
Theoretically, we establish a closed-form bound between the privacy budget $\epsilon$ and centroid estimation error, proving the potential field's Lipschitz smoothing properties exponentially suppress noise in high-density regions.
Empirically, GFC outperforms state-of-the-art methods on ten benchmarks, especially under strong LDP constraints ($\epsilon < 1$), while maintaining comparable performance at lower privacy budgets. By reformulating federated clustering as a topological persistence problem in a synthetic physics-inspired space, GFC achieves unprecedented privacy-accuracy trade-offs without iterative communication, providing a new perspective for privacy-preserving distributed learning.

Topological Federated Clustering via Gravitational Potential Fields Under Local Differential Privacy

The discovery rate of optical transients will explode to 10 million public alerts per night once the Vera C. Rubin Observatory’s Legacy Survey of Space and Time comes online, overwhelming the traditional physics-based inference pipelines. A continuous-time forecasting AI model is of interest because it can deliver millisecond-scale inference for thousands of objects per day, whereas legacy MCMC codes need hours per object. In this paper, we propose a continuous-time variational autoencoder for panels of sparse and irregularly time-sampled (gappy) astrophysical light curves that are nonstationary, heteroscedastic, and inherently dependent. Our model combines a masked GRU-ODE encoder with a latent neural ODE propagator and an interpretable Gaussian-basis decoder. The encoder learns to summarize panels of imbalanced and correlated data even when only a handful of points are observed. The neural ODE then integrates this hidden state forward in continuous time, extrapolating to future unseen epochs. This extrapolated time series is further encoded by deep sets to a latent distribution that is decoded to a weighted sum of Gaussian basis functions, the parameters of which are physically meaningful. Such parameters (e.g., rise time, decay rate, peak flux) directly drive downstream prioritization of spectroscopic follow-up for astrophysical surveys. Beyond astronomy, the architecture offers a generic recipe for interpretable and continuous-time sequence modeling in any time domain where data are multivariate, sparse, heteroscedastic, and irregularly spaced.

SELDON: Supernova Explosions Learned by Deep ODE Networks

The mixed truck-drone delivery system has attracted increasing attention for its potential to optimize last-mile logistics. While the Flying Sidekick Traveling Salesman Problem (FSTSP) provides a foundation for modeling the truck-drone collaboration, it falls short of capturing real-world complexities by assuming a single truck-drone pair operating on a fully connected graph. We introduce the Multi-Agent FSTSP (MA-FSTSP), which extends FSTSP to handle multiple trucks, each carrying multiple drones operating over real road networks. Trucks must follow roads, while drones can fly directly between locations. To solve this NP-hard problem efficiently, we propose a novel three-phase algorithm that first partitions customers using a set-based distance heuristic, then computes initial truck routes via a Set TSP formulation, and finally optimizes drone deployment patterns by dynamic programming. Through extensive experiments on real-world road networks from Manhattan (1,024 nodes) and Boston (11,000 nodes), we demonstrate that our method achieves more than 30\% cost reduction compared to existing approaches while scaling effectively to problems with 150 customers within a 20-minute computational time-bound.

Optimization of Multi-Agent Flying Sidekick Traveling Salesman Problem over Road Networks

With the rapid advance of spatial multi-omics technologies, it has become possible to simultaneously profile transcripts, proteins and chromatin states at their native spatial coordinates, thereby uncovering molecular architecture that transcends any single-omics perspective. However, the resulting data matrices are often highly sparse and suffer from unstable dimensionality. Graph-based neural methods capture only local neighborhood information, whereas conventional Transformers, although capable of modelling long-range dependencies, incur prohibitive computational costs on such data. To overcome these limitations, we propose TLAGC—a Taylor-Linear-Attention-Guided Graph Convolutional framework that couples a Taylor-expanded linear attention (TLA) mechanism with graph convolutional networks. By eliminating the soft-max operation and linking the LocalGCN via residual connections, TLA preserves local structural information while enabling the integration of global and local contexts, thereby alleviating ineffective information propagation between spatially distant yet transcriptionally similar regions. Theoretical analysis confirms that TLA indeed reduces computational complexity, and extensive experiments on multiple spatial multi-omics benchmarks demonstrate that TLAGC consistently outperforms state-of-the-art baselines in delineating spatial domains.

TLAGC: Taylor Linear Attention-Guided Graph Convolutions for Revealing Spatial Domains in Spatial Multi-Omics Data

Despite significant advancements in dynamic neural rendering, existing methods fail to address the unique challenges posed by UAV-captured scenarios, particularly those involving monocular camera setups, top-down perspective, and multiple small, moving humans, which are not adequately represented in existing datasets. In this work, we introduce UAV4D, a framework for enabling photorealistic rendering for dynamic real-world scenes captured by UAVs. Specifically, we address the challenge of reconstructing dynamic scenes with multiple moving pedestrians from monocular video data without the need for additional sensors. We use a combination of a 3D foundation model and a human mesh reconstruction model to reconstruct both the scene background and humans. We propose a novel approach to resolve the scene scale ambiguity and place both humans and the scene in world coordinates by identifying human-scene contact points. Additionally, we exploit the SMPL model and background mesh to initialize Gaussian splats, enabling holistic scene rendering. We evaluated our method on three complex UAV-captured datasets: VisDrone, Manipal-UAV, and Okutama-Action, each with distinct characteristics and 10-50 humans. Our results demonstrate the benefits of our approach over existing methods in novel view synthesis, achieving a 1.5 dB PSNR improvement and superior visual sharpness.

UAV4D: Dynamic Neural Rendering of Human-Centric UAV Imagery Using Gaussian Splatting

Autonomous driving systems have achieved remarkable capabilities in real-world deployment, yet ensuring safety under corner cases remains a significant challenge due to the scarcity and constrained diversity of safety-critical scenarios. Existing generation methods may either lead to irrational vehicle behaviors or be limited by fixed collision patterns, while both heavily rely on existing map datasets, restricting the diversity. To address these fundamental limitations, we introduce **Any2Critical**, the first framework that can encode arbitrary real-world scenarios and generate contextually relevant safety-critical scenarios with realistic driving behaviors. Specifically, Any2Critical addresses two key challenges: (1) developing comprehensive, diverse map data by successfully leveraging everyday traffic situations as the most abundant source of real-world driving contexts, and (2) proposing an RAG-based Safety-Critical Scenario Generation Strategy based on our curated NHTSA-5K database for achieving an optimal balance between scenario diversity and behavioral rationality. Through comprehensive evaluation, we demonstrate that Any2Critical consistently achieves collision rates with an average of 89.69% across diverse scenarios and autonomous driving systems, significantly outperforming current state-of-the-art generation methods.

Any2Critical: Safety-Critical Scenario Generation from Arbitrary Real-World Driving Contexts

LiDAR-based 3D object detection is widely used in safety-critical systems. However, these systems remain vulnerable to backdoor attacks that embed hidden malicious behaviors during training. A key limitation of existing backdoor attacks is their lack of physical realizability, primarily due to the digital-to-physical domain gap. Digital triggers often fail in real-world settings because they overlook material-dependent LiDAR reflection properties. On the other hand, physically constructed triggers are often unoptimized, leading to low effectiveness or easy detectability.
This paper introduces Material-Oriented Backdoor Attack (MOBA), a novel framework that bridges the digital–physical gap by explicitly modeling the material properties of real-world triggers. MOBA tackles two key challenges in physical backdoor design: 1) robustness of the trigger material under diverse environmental conditions, 2) alignment between the physical trigger’s behavior and its digital simulation. First, we propose a systematic approach to selecting robust trigger materials, identifying titanium dioxide (TiO$_2$) for its high diffuse reflectivity and environmental resilience. Second, to ensure the digital trigger accurately mimics the physical behavior of the material-based trigger, we develop a novel simulation pipeline that features: (1) an angle-independent approximation of the Oren–Nayar BRDF model to generate realistic LiDAR intensities, and (2) a distance-aware scaling mechanism to maintain spatial consistency across varying depths. We conduct extensive experiments on state-of-the-art LiDAR-based and Camera-LiDAR fusion models, showing that MOBA achieves a 93.50% attack success rate, outperforming prior methods by over 41%. Our work reveals a new class of physically realizable threats and underscores the urgent need for defenses that account for material-level properties in real-world environments.

MOBA: A Material-Oriented Backdoor Attack Against LiDAR-Based 3D Object Detection Systems

A reliable foundation model of functional neuroimages is critical to promote clinical applications where the performance of current AI models is significantly impeded by a limited sample size. 
To that end, tremendous efforts have been made to pretraining large models on extensive unlabeled fMRI data using scalable self-supervised learning. 
Since self-supervision is not necessarily aligned with the brain-to-outcome relationship, most foundation models are suboptimal to the downstream task, such as predicting disease outcomes.
By capitalizing on rich environmental variables and demographic data along with an unprecedented amount of functional neuroimages, we form the brain modeling as a multitask learning and present a scalable model architecture for (i) multitask pretraining by tokenizing multiple brain-environment interactions (BEI) and (ii) semi-supervised finetuning by assigning pseudo-labels of default BEI.
We have evaluated our foundation model on a variety of applications, including sex prediction, human behavior recognition, and disease early diagnosis of Autism, Parkinson's disease, Alzheimer's disease, and {Schizophrenia}, where promising results indicate the great potential to facilitate current neuroimaging applications in clinical routines.

Large Connectome Model: An fMRI Foundation Model of Brain Connectomes Empowered by Brain-Environment Interaction in Multitask Learning Landscape

Molecular property prediction is a crucial task that guides the design of new compounds, including drugs and materials. While explainable artificial intelligence methods aim to scrutinize model predictions by identifying influential molecular substructures, many existing approaches rely on masking strategies that remove either atoms or atom-level features to assess importance via fidelity metrics. These methods, however, often fail to adhere to the underlying molecular distribution and thus yield unintuitive explanations. In this work, we propose counterfactual masking, a novel framework that replaces masked substructures with chemically reasonable fragments sampled from generative models trained to complete molecular graphs. Rather than evaluating masked predictions against implausible zeroed-out baselines, we assess them relative to counterfactual molecules drawn from the data distribution. Our method offers two key benefits: (1) molecular realism underpinning robust and distribution-consistent explanations, and (2) meaningful counterfactuals that directly indicate how structural modifications may affect predicted properties. We demonstrate that counterfactual masking is well-suited for benchmarking model explainers and yields more actionable insights across multiple datasets and property prediction tasks. Our approach bridges the gap between explainability and molecular design, offering a principled and generative path toward explainable machine learning in chemistry.

Downloads

Next from AAAI 2026

Perturb Your Data: Paraphrase-Guided Training Data Watermarking

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Perturb Your Data: Paraphrase-Guided Training Data Watermarking

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads