Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Property-constrained molecular generation and editing are crucial in AI-driven drug discovery but remain hindered by two factors: (i) capturing the complex relationships between molecular structures and multiple properties remains challenging, and (ii) the narrow range and incomplete annotations of molecular properties limit the effectiveness of models that rely heavily on property information. To tackle these limitations, we propose HSPAG, a data-efficient framework featuring hierarchical structure–property alignment. By treating SMILES and molecular properties as complementary modalities, the model learns their relationships at atom, substructure, and whole-molecule levels during pre-training, thereby implicitly encoding property information into molecular representations. Moreover, we select representative samples through scaffold clustering and hard samples via an auxiliary variational auto-encoder (VAE), substantially reducing the required pre-training data. In addition, we incorporate a property relevance-aware masking mechanism and diversified perturbation strategies to enhance generation quality under sparse annotations. Experimental results demonstrate that HSPAG effectively models complex molecular characteristics to capture fine-grained structure–property insights, enabling controllable molecular generation under multiple property constraints. Two real-world case studies further validate the editing capabilities of HSPAG, highlighting its practical potential in lead compound screening and optimization.
