Singapore

Predicting drug–target interactions (DTIs) is a fundamental task in computational drug discovery, yet it remains challenging under distribution shifts and limited training data. Existing approaches often suffer from poor generalization, weak cross-modal alignment between molecular and protein representations, and vulnerability to noisy supervision.We propose ESP-DTI, a unified framework designed to enhance generalization by integrating large-scale protein language models with curriculum learning and cross-modal contrastive alignment. Specifically, we leverage ESM-2 to encode context-aware protein representations and adopt a CLIP-style contrastive objective to align drug and protein embeddings in a shared latent space. To further improve learning robustness, we introduce a progressive curriculum sampling strategy that dynamically schedules training instances based on model confidence, enabling a gradual shift from easy to hard examples.Experimental results on four benchmark datasets demonstrate that ESP-DTI consistently outperforms state-of-the-art baselines, achieving a +3.1% improvement in average accuracy. Ablation studies confirm the complementary benefits of each component, validating their collective contribution to robust and generalizable DTI prediction.Our work underscores the effectiveness of combining pretrained protein language models with structured training curricula and cross-modal contrastive learning for reliable DTI prediction under real-world, distribution-shifted conditions.The source code is available at https://anonymous.4open.science/r/ESP-DTI-C926

AAAI 2026

Generalizable Drug–Target Interaction Prediction via ESM-2 Representations and Progressive Contrastive Curriculum Learning

progressive sample scheduling

self-paced curriculum learning

clip‑style contrastive learning

protein language models (esm-2)

drug-target interaction (dti) prediction

ml: deep learning algorithms

app: natural sciences

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Scalable and generalizable analysis of brain activity is essential for advancing both clinical diagnostics and cognitive research. Electroencephalography (EEG), a non-invasive modality with high temporal resolution, has been widely used for brain states analysis. However, most exiting EEG models are usually tailored for single specific tasks, limiting their utility in realistic scenarios where EEG analysis often involves multi-task and continuous reasoning. In this work, we introduce EEG Agent, a general-purpose framework that leverages large language models (LLMs) to schedule and plan multiple tools to automatically complete EEG-related tasks. EEG Agent is capable of performing the key functions: EEG basic information perception, spatiotemporal EEG exploration, EEG event detection, interaction with users, and EEG report generation. To realize the capabilities, we design a toolbox composed of different tools for EEG preprocessing, feature extraction, event detection, etc. These capabilities were evaluated on public datasets, and our EEG Agent can support flexible and interpretable EEG analysis, highlighting its potential for real-world clinical applications.

EEG Agent: A Unified Framework for Automated EEG Analysis Using Large Language Models

Theory of Mind (ToM) refers to the ability to reason about others’ mental states, such as beliefs, desires, and intentions. Equipping large language models (LLMs)-driven agents with ToM has been shown to improve their coordination in multiagent collaborative tasks. However, we find that the mismatches in ToM reasoning depth between agents—what we call misaligned ToM orders—can lead to insufficient or excessive reasoning about others, thereby impairing their coordination. To address this issue, we design an adaptive ToM (A-ToM) agent, which can align in ToM orders with its partner. Based on prior interactions, the agent estimates the partner’s likely ToM order and leverages this estimation to predict the partner’s action, thereby facilitating behavioral coordination. We conduct empirical evaluations on four multi-agent coordination tasks: a repeated matrix game, two grid navigation tasks and an Overcooked task. The results validate our findings on ToM alignment and demonstrate the effectiveness of our A-ToM agent. Furthermore, we investigate the applicability of both our findings and the A-ToM agent.

Adaptive Theory of Mind for LLM-based Multi-Agent Coordination

Continual Learning (CL) seeks to enable neural networks to incrementally acquire new knowledge (plasticity) while retaining existing knowledge (stability). Although pre-trained models (PTMs) have provided a strong foundation for CL, existing approaches face a fundamental challenge in balancing these two competing objectives. Current methods typically address stability by freezing the PTM backbone, which severely limits the model's plasticity, particularly when incoming data distribution diverges largely from the pre-training data. Alternatively, sequentially fine-tuning the entire PTM can adapt to new knowledge but often leads to catastrophic forgetting, highlighting the critical stability-plasticity trade-off in PTM-based CL. To address this limitation, we propose **A**dapting PTMs before the core **CL** process (ACL), a novel framework that introduces a plug-and-play adaptation phase prior to learning each new task. During this phase, ACL refines the PTM backbone by aligning embeddings with their original class prototypes while distancing them from irrelevant classes. This mechanism theoretically and empirically demonstrates desirable balance between stability and plasticity, significantly improving CL performance across benchmarks and integrated methods.

Adapt Before Continual Learning

Traditional knowledge distillation relies on simple MSE or KL divergence losses that fail to capture the complex distributional relationships between teacher and student model representations. We propose FlowDistill, a novel distillation framework that employs normalizing flows to model and transfer the intricate knowledge distributions from teacher to student models. Our approach introduces three key innovations: (1) Invertible Knowledge Mapping using continuous normalizing flows (CNFs) to learn bijective transformations between teacher and student representation spaces, enabling precise knowledge transfer without information loss, (2) Flow-Guided Progressive Distillation that gradually increases the complexity of knowledge transfer by learning hierarchical flow transformations from simple to complex distributions, and (3) Conditional Flow Networks that adapt knowledge transfer based on input context and task requirements. Unlike previous diffusion-based distillation methods such as DiffKD that suffer from computational overhead due to iterative denoising processes and information loss during noise addition, our flow-based approach provides exact invertible transformations with significantly reduced computational cost. Extensive experiments on ImageNet classification, COCO object detection, and Cityscapes semantic segmentation demonstrate that FlowDistill achieves superior performance with 2.1\% accuracy improvement over DiffKD on ResNet-34 to ResNet-18 distillation while reducing inference time by 3.5×. Our method establishes new state-of-the-art results across multiple distillation benchmarks and provides theoretical guarantees for lossless knowledge transfer through invertible flow transformations.

Flow-Based Knowledge Transfer for Efficient Large Model Distillation

In this work, we introduce the notion of targeting for multi-criteria decision making. The problem involves selecting the best alternatives related to one particular alternative, called the target. We use an axiomatic approach to this problem by establishing properties that any targeting method should satisfy. We present a representation theorem and show that satisfying the main properties of targeting requires aggregating the evaluations of the alternatives related to the target. We propose various candidate targeting methods and examine the properties satisfied by each method.

Targeting in Multi-Criteria Decision Making

Multi-turn instruction following is essential for building intelligent conversational systems that can consistently adhere to instructions across dialogue turns. However, existing approaches to enhancing multi-turn instruction following primarily rely on collecting or generating large-scale multi-turn dialogue datasets to fine-tune large language models (LLMs), which treat each response generation as an isolated task and fail to explicitly incorporate multi-turn instruction following into the optimization objectives. As a result, instruction-tuned LLMs often struggle with complex long-distance constraints. In multi-turn dialogues, relational constraints across turns can be naturally modeled as labeled directed edges, making graph structures particularly suitable for modeling multi-turn instruction following. Despite this potential, leveraging graph structures to enhance the multi-turn instruction following capabilities of LLMs remains unexplored. To bridge this gap, we propose GraphIF, a plug-and-play framework that models multi-turn dialogues as directed relation graphs and leverages graph prompts to enhance the instruction following capabilities of LLMs. GraphIF comprises three key components: (1) an agent-based relation extraction module that captures inter-turn semantic relations via action-triggered mechanisms to construct structured graphs; (2) a relation graph prompt generation module that converts structured graph information into natural language prompts; and (3) a response rewriting module that refines initial LLM outputs using the generated graph prompts. Extensive experiments on two long multi-turn dialogue datasets demonstrate that GraphIF can be seamlessly integrated into instruction-tuned LLMs and leads to significant improvements across all four multi-turn instruction-following evaluation metrics, with gains ranging from 7% to 52%.

GraphIF: Enhancing Multi-Turn Instruction Following for Large Language Models with Relation Graph Prompt

ColBERT introduced a late interaction mechanism that independently encodes queries and documents using BERT, and computes similarity via fine-grained interactions over token-level vector representations. This design enables expressive matching while allowing efficient computation of scores, as the multi-vector document representations could be pre-computed offline. 
ColBERT models distance using a Chamfer-style function: for each query token, it selects the closest document token and sums these distances across all query tokens.

In our work, we explore enhancements to the Chamfer distance function by computing a weighted sum over query token contributions, where weights reflect the token importance. Empirically, we show that this simple extension, requiring only token-weight training while keeping the multi-vector representations fixed, further enhances the expressiveness of late interaction multi-vector mechanism. In particular, on the BEIR benchmark, our method achieves an average improvement of 1.28\% in Recall@10 in the zero-shot setting using IDF-based weights, and 3.66\% through few-shot fine-tuning.

Incorporating Token Importance in Multi-Vector Retrieval

Multi-Agent collaboration addresses inherent limitations of individual agent systems, including limited sensing range and occlusion-induced blind spots. Despite significant progress have been achieved, persistent challenges such as constrained communication bandwidth and under-explored subsequent extensions still hinder real-time deployment and further developments of collaborative autonomous driving systems. In this work, we propose ZeRCP, a unified communication-efficient framework that bridges collaborative perception with future scene prediction. Specifically, (i) we devise a plug-and-play request-free spatial filtering module (ZeroR) that eliminates the reliance on request maps while preserving inter-agent spatial complementarity modeling. This approach further reduce communication latency and bandwidth consumptions. (ii) We design a multi-scale pyramidal prediction network anchored by a novel Spatial-Temporal Deformable Attention (STDA) module, extending frame-wise detection to multi-frame predictions. This method adeptly models spatiotemporal dynamics without relying on auto-regressive recursion. We evaluate our method on a large-scale dataset in challenging semantic segmentation and scene prediction tasks. Extensive experiments demonstrate the superiority and effectiveness of ZeRCP in bandwidth-constrained collaboration scenarios and spatiotemporal prediction applications.

ZeRCP: Towards Communication-Efficient Collaborative Perception and Future Scene Prediction via Request-Free Spatial Filtering

Machine unlearning has emerged as a promising approach to remove specific knowledge from large language models (LLMs), especially for safety-critical applications. However, existing representation-based methods lack guidance for selecting representation locations to unlearn (RMU), thus lacking precision in unlearning, while probability-based methods are vulnerable to fine-tuning attacks which use unrelated and safe data to fine-tune models. To address these problems, this paper presents an adaptive knowledge guidance and memory perturbation mechanisms, called ALMPU (Adaptive Localized Memory Perturbation Unlearning) which addresses the lack of knowledge guidance in representation-based unlearning methods and mitigates the impact of fine-tuning attacks on unlearned models. Specifically, we apply scaling factors to attention heads and select the most sensitive ones as knowledge guidance. Guided by the previous knowledge localization, we integrate enhanced memory perturbation—which forces the model to preserve specific knowledge—into the standard representation-based unlearning process at these sensitive positions. Through this perturbation mechanism, the model achieves more thorough elimination of the target knowledge. By adding interventions to selected attention heads and explicitly optimizing against fine-tuning attacks during the unlearning process, ALMPU creates a controlled divergence from the original model that is inherently resistant to relearning attempts. Experimental evaluation on the WMDP benchmark demonstrates that ALMPU consistently outperforms baseline methods across different scales of fine-tuning attacks.

A Robust Unlearning Method with Adaptive Knowledge Guidance and Memory Preservation

Large Language Models (LLMs) are increasingly employed in applications that require processing information from heterogeneous formats, including text, tables, infoboxes, and knowledge graphs. However, systematic biases toward particular formats may undermine LLMs' ability to integrate heterogeneous data impartially, potentially resulting in reasoning errors and increased risks in downstream tasks. Despite these concerns, it remains uncertain whether such format biases are systematic, which data-level factors contribute to them, and what internal mechanisms in LLMs underlie their emergence.

In this paper, we make the first attempt to investigate and analyze the format bias in LLMs. To systematically investigate the aforementioned questions, we conduct a three-stage empirical study by constructing an heterogeneous data conflict scenario for the exploration of bias. The first stage explores the presence and direction of bias across a diverse range of LLMs. The second stage aims to examine how key data-level factors, including information richness, structure quality, and format type, influence these biases. The third stage analyzes how format bias emerges within LLMs' attention patterns and evaluates a lightweight intervention to test its potential mitigability. Based on these investigations, we identify three future research directions to reduce format bias: improving data preprocessing through format sanitization and normalization, introducing inference-time interventions such as attention re-weighting, and developing format-balanced training corpora. These directions will support the design of more robust and fair heterogeneous data processing systems.

Downloads

Next from AAAI 2026

EEG Agent: A Unified Framework for Automated EEG Analysis Using Large Language Models

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

EEG Agent: A Unified Framework for Automated EEG Analysis Using Large Language Models

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads