Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Predicting drug–target interactions (DTIs) is a fundamental task in computational drug discovery, yet it remains challenging under distribution shifts and limited training data. Existing approaches often suffer from poor generalization, weak cross-modal alignment between molecular and protein representations, and vulnerability to noisy supervision.We propose ESP-DTI, a unified framework designed to enhance generalization by integrating large-scale protein language models with curriculum learning and cross-modal contrastive alignment. Specifically, we leverage ESM-2 to encode context-aware protein representations and adopt a CLIP-style contrastive objective to align drug and protein embeddings in a shared latent space. To further improve learning robustness, we introduce a progressive curriculum sampling strategy that dynamically schedules training instances based on model confidence, enabling a gradual shift from easy to hard examples.Experimental results on four benchmark datasets demonstrate that ESP-DTI consistently outperforms state-of-the-art baselines, achieving a +3.1% improvement in average accuracy. Ablation studies confirm the complementary benefits of each component, validating their collective contribution to robust and generalizable DTI prediction.Our work underscores the effectiveness of combining pretrained protein language models with structured training curricula and cross-modal contrastive learning for reliable DTI prediction under real-world, distribution-shifted conditions.The source code is available at https://anonymous.4open.science/r/ESP-DTI-C926