
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
The chemical space of drug-like molecules is vast, motivating the development of generative models that must learn broad chemical distributions, enable conditional generation by capturing structure-property representations, and provide fast molecular generation. To address these challenges, we present STAR-VAE (SELFIES-encoded, Transformer-based, AutoRegressive Variational Auto Encoder), a scalable latent‑variable framework with a Transformer encoder and an autoregressive Transformer decoder. It is trained on 79 million drug-like molecules from PubChem, using SELFIES to guarantee syntactic validity. The latent-variable formulation enables conditional generation: a property predictor supplies a conditioning signal that is applied consistently to the latent prior, the inference network, and the decoder. Our contributions are: (i) a Transformer-based encoder-decoder model trained on SELFIES representations; (ii) a principled conditional latent-variable formulation for property-guided generation; and (iii) efficient finetuning with low-rank adapters (LoRA) in both encoder and decoder, enabling fast adaptation with limited property and activity data. Our approach demonstrates favorable performance on the GuacaMole, MOSES, and Tartarus benchmarks for both unconditional and conditional generation tasks. These results suggest that a modernized, scale-appropriate VAE remains competitive for molecular generation when paired with principled conditioning and parameter-efficient finetuning.
