Singapore

The chemical space of drug-like molecules is vast, motivating the development of generative models that must learn broad chemical distributions, enable conditional generation by capturing structure-property representations, and provide fast molecular generation. To address these challenges, we present STAR-VAE (SELFIES-encoded, Transformer-based, AutoRegressive Variational Auto Encoder), a scalable latent‑variable framework with a Transformer encoder and an autoregressive Transformer decoder. It is trained on 79 million drug-like molecules from PubChem, using SELFIES to guarantee syntactic validity. The latent-variable formulation enables conditional generation: a property predictor supplies a conditioning signal that is applied consistently to the latent prior, the inference network, and the decoder. Our contributions are: (i) a Transformer-based encoder-decoder model trained on SELFIES representations; (ii) a principled conditional latent-variable formulation for property-guided generation; and (iii) efficient finetuning with low-rank adapters (LoRA) in both encoder and decoder, enabling fast adaptation with limited property and activity data. Our approach demonstrates favorable performance on the GuacaMole, MOSES, and Tartarus benchmarks for both unconditional and conditional generation tasks. These results suggest that a modernized, scale-appropriate VAE remains competitive for molecular generation when paired with principled conditioning and parameter-efficient finetuning.

AAAI 2026

STAR-VAE: Latent Variable Transformers for Scalable and Controllable Molecular Generation

workshop paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Accurate and scalable machine-learned inter-atomic potentials (MLIPs) are essential for molecular simulations ranging from drug discovery to new material design. Current state-of-the-art models enforce roto-translational symmetries through equivariant neural network architectures, a hard-wired inductive bias that can often lead to reduced flexibility, computational efficiency, and scalability. In this work, we introduce \textbf{TransIP}: \textbf{Trans}former-based \textbf{I}nter-Atomic \textbf{P}otentials, a novel training paradigm for interatomic potentials achieving symmetry compliance without explicit architectural constraints. Our approach guides a generic non-equivariant Transformer-based model to learn SO(3)-equivariance by optimizing its representations in the embedding space. Trained on the recent Open Molecules (OMol25) collection, a large and diverse molecular dataset built specifically for MLIPs and covering different types of molecules (including small organics, biomolecular fragments, and electrolyte-like species), TransIP attains comparable performance in machine-learning force fields versus state-of-the-art equivariant baselines. Further, compared to a data augmentation baseline, TransIP achieves 40% to 60% improvement in performance across varying OMol25 dataset sizes. More broadly, our work shows that learned equivariance can be a powerful and efficient alternative to equivariant or augmentation-based MLIP models.

Learning Inter-Atomic Potentials without Explicit Equivariance

Small interfering RNA (siRNA) is a short double-stranded RNA molecule (~21–23 nucleotides) with the potential to cure diseases by silencing the function of target genes. Due to its well-understood mechanism, many siRNA-based drugs have been evaluated in clinical trials. However, selecting effective binding regions and designing siRNA sequences requires extensive experimentation, making the process costly. As genomic resources and publicly available siRNA datasets continue to grow, data-driven models can be leveraged to better understand siRNA–mRNA interactions. To fully exploit such data, curating high-quality siRNA datasets is essential to minimize experimental errors and noise. We propose siDPT: siRNA efficacy Prediction via Debiased Preference-Pair Transformer, a framework that constructs a preference-pair dataset and designs an siRNA–mRNA interactive transformer with debiased ranking objectives to improve siRNA inhibition prediction and generalization. We evaluate our approach using two public datasets and one newly collected patent dataset. Our model demonstrates substantial improvement in Pearson correlation and strong performance across other metrics.

siDPT: siRNA Efficacy Prediction via Debiased Preference-Pair Transformer

Learning path recommendation (LPR) is essential for alleviating information overload in large-scale online education. However, existing approaches often rely on static records of student histories and underexploit the semantic richness of learning resources, leading to recommendations that are misaligned with learners’ evolving knowledge states. In this work, we move from static records to dynamic contexts, introducing a context-driven framework that systematically acquires, organizes, compresses, and continuously updates multi-source information. Within this framework, Large Language Models (LLMs) interpret educational content and generate candidate paths, while Reinforcement Learning (RL) and Knowledge Tracing Models (KTM) iteratively refine recommendations through adaptive feedback. This context-driven perspective enhances adaptability, diversity, and explainability of LPR.
We validate our approach through extensive experiments on real-world datasets, including Math, Physics, and MOOPer. Results show that the proposed approach consistently outperforms strong baselines across multiple evaluation metrics, achieving significant gains in both learning promotion and diversity, while maintaining competitive efficiency.

Context-Driven Learning Path Recommendation: From Static Records to Dynamic Contexts

Elementary teachers face significant time burdens creating engaging, curriculum-aligned storytelling videos, often requiring hours of manual editing. We present a web-based platform that integrates a four-stage ML pipeline to automate video authoring from static drawings while preserving pedagogical control. Our system combines: (1) SAM-based character extraction with speech-bubble removal, (2) CLIP zero-shot character-text matching for story-guided segmentation, (3) LightGBM protagonist detection using visual saliency and narrative centrality features, and (4) emotion-aware motion generation synchronized with narrative context. Unlike general animation tools, our design prioritizes curriculum alignment through teacher-controllable story pacing, character emphasis, and narrative voice selection. In a user study with elementary teachers creating standards-aligned content, our platform substantially reduced authoring time compared to baseline video editing tools, with teachers reporting high creative control and strong satisfaction with pedagogical quality. Qualitative interviews revealed that AI-generated motion suggestions inspired teachers to explore new storytelling approaches they had not previously considered. This work demonstrates how context-aware, integrated ML systems can democratize educational content creation while amplifying rather than replacing teacher expertise.

Teacher-in-the-Loop Story-to-Video: Vision-Language Models and ML Ranking for Educational Content Authoring

Large Language Models (LLMs) are increasingly integrated into intelligent tutoring systems to provide human-like and adaptive instruction. However, most existing approaches fail to capture how students' knowledge evolves dynamically across their proficiencies, conceptual gaps, and forgetting patterns. This challenge is particularly acute in mathematics tutoring, where effective instruction requires fine-grained scaffolding precisely calibrated to each student's mastery level and cognitive retention. To address this issue, we propose TASA (Teaching According to Students' Aptitude), a student-aware tutoring framework that integrates persona, memory, and forgetting dynamics for personalized mathematics learning. Specifically, TASA maintains a structured student persona capturing proficiency profiles and an event memory recording prior learning interactions. By incorporating a continuous forgetting curve with knowledge tracing, TASA dynamically updates each student's mastery state and generates contextually appropriate, difficulty-calibrated questions and explanations. Empirical results demonstrate that TASA achieves superior learning outcomes and more adaptive tutoring behavior compared to representative baselines, underscoring the importance of modeling temporal forgetting and learner profiles in LLM-based tutoring systems.

Teaching According to Students' Aptitude: Personalized Mathematics Tutoring via Persona-, Memory-, and Forgetting-Aware LLMs

Sequential Recommender Systems (SRS) have been widely used in various fields, such as e-commerce, advertisement, etc. However, existing SRS models are often sub-optimal in educational scenarios, because stricter privacy regulations lead to higher sparsity in educational data. Thus, for better educational recommendation, we propose a novel multimodal SRS model, named OMSEE-REC (Multimodal Sequential Recommendation of On-line courses with MLLM-Enhanced Semantic Edge Embedding), that significantly enhances the effectiveness of course recommendations. By integrating multimodal information to enrich learning representations, the data sparsity problem gets alleviated. In OMSEE-REC, a multimodal large language model (MLLM) is utilized to transform non-text modalities, such as images, into refined textual representations. Then, we propose generating item semantic edge embeddings to enhance the semantic relations among courses, which fills the gap in cross-modal semantics. Next, for analyzing users' long- and short-term preferences, the enhanced information of items is input into MLLMs as prompts to derive user semantic edge embeddings. 
Finally, a non-intrusive attention mechanism is introduced into the OMSEE-REC, which utilizes semantic edge embeddings as lightweight side information to guide embedding propagation. Empirical evaluations on dual-domain datasets—including large-scale general recommendation data (Amazon) and a private multimodal educational dataset—demonstrate the superior performance of OMSEE-REC, confirming its effectiveness in the fields of intelligent education and sequential recommendation modeling.

Multimodal Sequential Recommendation of On-line courses with MLLM-Enhanced Semantic Edge Embedding

Knowledge Tracing (KT) is a fundamental task that estimates a learner’s knowledge state and predicts future performance based on their past interactions. However, existing transformer-based KT models rely on simple index-distance-based attention decay, which fails to adequately capture the learner’s forgetting effects and the complex dependencies among past interactions. To address this limitation, we propose a novel positional encoding method, Reconstruction Attention Positional Encoding (RAPE), which incorporates Base-Level Activation (BLA)−inspired by the ACT-R theory of cognitive psychology−into the attention mechanism. RAPE enhances prediction accuracy by jointly modeling the learner’s cognitive forgetting process and problem-solving sequence within the Transformer framework. Experiments on three public benchmark datasets demonstrate that RAPE consistently outperforms both traditional positional encoding approaches and state-of-the-art Relative Forgetting-Aware (RFA) models in terms of predictive accuracy. Our study presents a novel approach to integrating cognitive forgetting theory into neural KT, improving both interpretability and predictive performance in long-term learning environments, and promoting its potential application in intelligent educational systems.

Reconstruction Attention Positional Encoding for Knowledge Tracing: Integrating Cognitive Forgetting into Transformer-Based Models

The rapid advancement of large-scale language models (LLMs) has shown their potential to transform intelligent education systems (IESs) through automated teaching and learning support applications. However, current IESs often rely on single-turn static question-answering, which fails to assess learners' cognitive levels, cannot adjust teaching strategies based on real-time feedback, and is limited to providing simple one-off responses. To address these issues, we introduce AgentTutor, a multi-turn interactive intelligent education system to empower personalized learning. It features an LLM-powered generative multi-agent system and a learner-specific personalized learning profile environment that dynamically optimizes and delivers teaching strategies based on learners' learning status, personalized goals, learning preferences, and multimodal study materials. It includes five key modules: curriculum decomposition, learner assessment, dynamic strategy, teaching reflection, and knowledge \& experience memory. We conducted extensive experiments on multiple benchmark datasets, AgentTutor significantly enhances learners' performance while demonstrating strong effectiveness in multi-turn interactions and competitiveness in teaching quality among other baselines.

AgentTutor: Empowering Personalized Learning with Multi-Turn Interactive Teaching in Intelligent Education Systems

Synthetic data generation offers promise for addressing data scarcity and privacy concerns in educational technology, yet practitioners lack empirical guidance for selecting between traditional resampling techniques and modern deep learning approaches. This study presents the first systematic benchmark comparing these paradigms using a 10,000-record student performance dataset. We evaluate three resampling methods (SMOTE, Bootstrap, Random Oversampling) against three deep learning models (Autoencoder, Variational Autoencoder, Copula-GAN) across multiple dimensions: distributional fidelity (Kolmogorov-Smirnov distance, Jensen-Shannon divergence), machine learning utility (Train-on-Synthetic-Test-on-Real scores), and privacy preservation (Distance to Closest Record). Our findings reveal a fundamental trade-off: resampling methods achieve near-perfect utility (TSTR: 0.997) but completely fail privacy protection (DCR: 0.00), while deep learning models provide strong privacy guarantees (DCR: 1.00) at significant utility cost. Variational Autoencoders emerge as the optimal compromise, maintaining 83.3% predictive performance while ensuring complete privacy protection. We provide actionable recommendations: use traditional resampling for internal development where privacy is controlled, and VAEs for external data sharing where privacy is paramount. This work establishes a foundational benchmark and practical decision framework
for synthetic data generation in learning analytics.

Synthetic Data in Education: Empirical Insights from Traditional Resampling and Deep Generative Models

Teachable agents powered by AI offer a promising approach to enhance the engagement and understanding of middle school students, as they often face challenges in grasping mathematical concepts and procedures. However, during student-AI interactions, it is essential to determine when the agent should stop to effectively regulate the cognitive and emotional states of the students, which are factors closely linked to positive participation and productive tutoring strategies. This raises a key question: who should decide when the teachable agent stops or continues? Empirical evidence highlights the advantages of using LLM-as-a-judge and knowledge-graph–based decision-making, as well as the potential benefits of fixed-turn conversations. Drawing on 64,060 messages from 7,991 conversations across four randomly assigned stopping mechanisms - 8-turns, 16-turns, standalone LLM-as-a-judge, and agent decisions with knowledge graphs (KGs), our experimental results highlight that (a) agent decisions informed by KGs were most effective at detecting when conversations should continue, sustaining learner engagement with the teachable agent; (b) off-topic utterances occurred more frequently under fixed-turn conditions; and (c) tutoring strategies such as questioning for help or evaluation and elaborating with justification were most prevalent when agents used KGs for decision-making. Practical implications for designing effective agent-student interactions are discussed.

Premium content

Next from AAAI 2026

Learning Inter-Atomic Potentials without Explicit Equivariance

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES