Singapore

Speculative decoding accelerates LLM inference by utilizing otherwise idle computational resources during memory-to-chip data transfer.
Current speculative decoding methods typically assume a considerable amount of available computing power, then generate a complex and massive draft tree using a small autoregressive language model to improve overall prediction accuracy. However, methods like batching have been widely applied in mainstream model inference systems as a superior alternative to speculative decoding, as they compress the available idle computing power. Therefore, performing speculative decoding with low verification resources and low scheduling costs has become an important research problem.
We believe that more capable models that allow for parallel generation on draft sequences are what we truly need. Recognizing the fundamental nature of draft models to only generate sequences of limited length, we propose SpecFormer.
a novel architecture that integrates unidirectional and bidirectional attention mechanisms. SpecFormer combines the autoregressive model’s ability to extract information from the entire input sequence with the parallel generation benefits of non-autoregressive models. This design eliminates the reliance on large prefix trees and achieves consistent acceleration, even in large-batch scenarios. Through lossless speculative decoding experiments across models of various scales, we demonstrate that SpecFormer sets a new standard for scaling LLM inference with lower training demands and reduced computational costs.

AAAI 2026

Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Oracle Bone Script, East Asia's earliest mature writing system from over 3,500 years ago, encodes ancient cognition through visual metaphors, yet remains largely undeciphered and inaccessible, severing modern society from its cultural roots. Traditional AI methods, while accurate in classification, treat glyphs as opaque data, neglecting their pictographic essence and failing to foster public understanding—exacerbating a heritage crisis amid linguistic evolution.
We pioneer a paradigm shift toward AI-driven cultural democratization, introducing OracleVis, the first human-validated multimodal dataset of glyph-image-explanation triplets, curated through expert collaborations to overcome data scarcity, bias, and incompleteness in archaeological sources. Building on this, OBS-VM, an explainability-centric multimodal large language model fine-tuned on Qwen 2-VL-7B, models pictographic reasoning by balancing semantic fidelity with interpretive transparency, transforming black-box predictions into cognition-aligned narratives.
Rigorous evaluations, including benchmarks and a user study with 24 non-experts, reveal our system's superiority: it outperforms GPT-4o by 35.3\% in pictographic rationality and 13.8\% in recognition accuracy, while interactive learning boosts knowledge gains (+5.5 vs. +1.7), interest (+1.9 vs. +0.4), and confidence (+2.0 vs. +0.3) over static methods. This work illuminates AI's potential to bridge ancient wisdom and contemporary audiences, redefining heritage preservation as an inclusive, socially impactful endeavor that turns cultural alienation into enlightened engagement.

Explainable Oracle Bone Script Recognition via Multimodal Pictographic Reasoning

Spatially multimodal omics technologies provide unprecedented opportunities to address cellular heterogeneity within tissue contexts. However, learning robust and informative latent representations from such complex data remains a significant challenge. Existing graph-based methods often rely on static connections or indirect optimization objectives, which can constrain the discriminability and diversity of the learned representations, particularly in the presence of sequencing noise and unknown biological priors. To overcome these limitations, we propose Robust Integrative Analysis of Multi-omics Datasets via Nuclear-norm Maximization (RIA) to adaptively integrate multimodal features and spatial information through a new graph-based architecture. At the core of RIA is the introduction of the batch nuclear norm maximization (bnm) loss, marking the first application of bnm within the multi-omics domain. By maximizing the nuclear norm of the batch assignment matrix derived from the latent space, RIA simultaneously enhances the discriminability and diversity of the learned embeddings. This objective is synergistically combined with a dynamic prototype contrastive learning strategy and a graph stability loss, ensuring comprehensive and robust optimization.Ultimately, RIA produces a structured, information-rich latent space that enables more reliable downstream analyses, including cell type identification, spatial domain discovery, and microenvironment characterization.

Robust Integrative Analysis of Multi-omics Datasets via Nuclear-norm Maximization

Cross-tokenizer knowledge distillation, where the teacher and student employ different tokenizers, is becoming increasingly prevalent, yet it poses underexplored challenges: existing methods fail to capture the rich knowledge encoded in teacher logits, as evidenced by the neglect of semantic information, inaccurate and biased logit alignment, and discarding distributional structure—ultimately leading to unfavorable distillation. To address these issues, we propose SeDi, a semantics and distribution-aware knowledge transfer framework tailored for cross-tokenizer distillation. To preserve factual knowledge, SeDi employs bipartite graph-based alignment at the tokenization level and a sliding window re-encoding strategy at the vocabulary level, enabling unbiased transfer of the teacher’s next-token predictions into the student’s vocabulary space. To further retain distributional information, we align the student’s entropy with that of the teacher by incorporating the student’s own logits during training, which helps to mitigate the exposure bias problem. Experiments on ten datasets across three task domains and five different teacher-student model pairs with varying vocabulary sizes demonstrate that SeDi delivers substantial improvements, with gains of up to 19.8%.

Bridging the Tokenizer Gap: Semantics and Distribution-aware Knowledge Transfer for Unbiased Cross-Tokenizer Distillation

Multi-view Learning (MVL) is a pivotal paradigm widely adopted across various fields. Despite recent advances, significant challenges remain. Existing methods primarily focus on enhancing the performance of fused multi-view representation, often neglecting the issue of Representation Degradation (RD) arising from discrepancies in the intrinsic quality of different views. To address the limitations, we propose a novel Granular-ball Fuzzy Split and Attention Fusion (GFSAF) learning, which leverages the nature of granular-ball to extract mutual and complementary representation separately. Meanwhile, the proposed method introduces an attention variant for fused representations to mitigate the RD problem. Specifically, GFSAF mainly consists of two training stages: Split-Extract Stage (SES) and Views-Fusion Stage (VFS). First, in SES, we design a novel Granular-ball Fuzzy Contrastive Learning (GFCL) to extract mutual representation, and introduce Noise Stripping Loss (L_NS) to reduce the influence of noise for complementary representation. Next, a novel multi-head Cross Views Attention (CVA) is proposed to employ attention mechanism from multi-view perspectives for comprehensive fused representations. Experimental results on eight databases demonstrate that GFSAF achieves superior performance compared to several state-of-the-art methods. Code is available at the supplementary materials.

Views Attention Fusion of Granular-ball Fuzzy Representations Split for Improved Multi-view Clustering

Open-Vocabulary Remote Sensing Image Segmentation (OVRSIS), an emerging task that adapts Open-Vocabulary Segmentation (OVS) to the remote sensing (RS) domain, remains underexplored due to the absence of a unified evaluation benchmark and the domain gap between natural and RS images.
To bridge these gaps, we first establish a standardized OVRSIS benchmark (**OVRSISBench**) based on widely-used RS segmentation datasets, enabling consistent evaluation across methods. Using this benchmark, we comprehensively evaluate several representative OVS/OVRSIS models and reveal their limitations when directly applied to remote sensing scenarios.
Building on these insights, we propose **RSKT-Seg**, a novel open-vocabulary segmentation framework tailored for remote sensing. RSKT-Seg integrates three key components: (1) a Multi-Directional Cost Map Aggregation (RS-CMA) module that captures rotation-invariant visual cues by computing vision-language cosine similarities across multiple directions; (2) an Efficient Cost Map Fusion (RS-Fusion) transformer, which jointly models spatial and semantic dependencies with a lightweight dimensionality reduction strategy; and (3) a Remote Sensing Knowledge Transfer (RS-Transfer) module that injects pre-trained knowledge and facilitates domain adaptation via enhanced upsampling.
Extensive experiments on the benchmark show that RSKT-Seg consistently outperforms strong OVS baselines by +3.8 mIoU and +5.9 mACC, while achieving 2× faster inference through efficient aggregation. Our anonymous code is in the appendix.

Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing

Diffusion models show promise for 3D molecular generation, but face a fundamental trade-off between sampling efficiency and conformational accuracy. While flow-based models are fast, they often produce geometrically inaccurate structures, as they have difficulty capturing the multimodal distributions of molecular conformations. In contrast, denoising diffusion models are more accurate but suffer from slow sampling, a limitation attributed to sub-optimal integration between diffusion dynamics and SE(3)‑equivariant architectures. 
To address this, we propose **VEDA**, a unified SE(3)-equivariant framework that combines variance-exploding diffusion with annealing to efficiently generate conformationally accurate 3D molecular structures. Specifically, our key technical contributions include: (1) a VE schedule that enables noise injection functionally analogous to simulated annealing, improving 3D accuracy and reducing relaxation energy; (2) a novel preconditioning scheme that reconciles the coordinate-predicting nature of SE(3)-equivariant networks with a residual-based diffusion objective, and (3) a new arcsin-based scheduler that concentrates sampling in critical intervals of the logarithmic signal-to-noise ratio. 
On the QM9 and GEOM-DRUGS datasets, VEDA matches the sampling efficiency of flow-based models, achieving state-of-the-art valency stability and validity with only 100 sampling steps. More importantly, VEDA's generated structures are remarkably stable, as measured by their relaxation energy ($\Delta E_{{relax}}$) during GFN2-xTB optimization. The median energy change is only 1.72kcal/mol, significantly lower than the 32.3kcal/mol from its architectural baseline, SemlaFlow. Our framework demonstrates that principled integration of VE diffusion with SE(3)-equivariant architectures can achieve both high chemical accuracy and computational efficiency.

VEDA: Generation of 3D Molecules via Variance-Exploding Diffusion with Annealing

Realistic background traffic is critical to the simulation platforms for autonomous driving (AD) testing. Given most of vehicles in reality are driven by human beings, introducing human driving (HD) vehicles to the background traffic is necessary and able to discover more problems of tested AD vehicle in the simulation stage. However, existing methods rely on ad-hoc rules or data-driven trainings to mimic partial human driver behaviors, which is not comprehensive and lack of transparency. In this work, we design a smart human driving vehicle simulator HDSim which is empowered by cognitively inspired modeling and AI models. HDSim enables diverse, realistic and scalable HD traffic simulation on AD testing platforms like Carla in a non-intrusive measure. There are two novel components in HDSim. First, we introduce a formalized model to guide the generation of diverse human driving styles by using different combinations of latent cognitive factors in a hierarchy. Second, we design a Perception-Mediated Behavior Influence (PMBI) mechanism to uses LLM-assisted perceptual transformations to indirectly fuse driving actions with driving styles. Experiments show that HDSim traffic can help simulation platforms like Carla to reveal 68% more failures of tested AD vehicles, and the explainability of reported accidents is also improved.

Diverse Human Driving Vehicle Simulation in Background Traffic for Autonomous Driving Tests

Molecular representation learning, a cornerstone for downstream tasks like molecular captioning and molecular property prediction, heavily relies on Graph Neural Networks (GNN). However, GNN suffers from the over-smoothing problem, where node-level features collapse in deep GNN layers. While existing feature projection methods with cross-attention have been introduced to mitigate this issue, they still perform poorly in deep features. This motivated our exploration of using Mamba as an alternative projector for its ability to handle complex sequences. However, we observe that while Mamba excels at preserving global topological information from deep layers, it neglects fine-grained details in shallow layers. The capabilities of mamba and cross-attention exhibit a global-local trade-off. To resolve this critical global-local trade-off, we propose Hierarchical and Structure-Aware Network (HSA-Net), a novel framework with two modules that enables a hierarchical feature projection and fusion. Firstly, a Hierarchical Adaptive Projector (HAP) module is introduced to process features from different graph layers. It learns to dynamically switch between a cross attention projector for shallow layers and a structure-aware Graph-Mamba projector for deep layers, producing high-quality, multi-level features. Secondly, to adaptively merge these multi-level features, we design a Source-Aware Fusion (SAF) Module, which flexibly selects fusion experts based on the characteristics of the aggregation features, ensuring a precise and effective final representation fusion. Extensive experiments demonstrate that our HSA-Net framework quantitatively and qualitatively outperforms current state-of-the-art (SOTA) methods. The demos of our method are attached in the supplementary material.

HSA-Net: Hierarchical and Structure-Aware Framework for Efficient and Scalable Molecular Language Modeling

Recent advances in large reasoning models (LRMs) have significantly enhanced long-chain reasoning capabilities over standard large language models (LLMs). However, LRMs often produce unnecessarily lengthy outputs even for simple queries, leading to inefficiencies or even accuracy degradation compared to LLMs. To address this, we propose CP-Router, a training-free, model-agnostic routing framework that dynamically selects between an LLM and an LRM, demonstrated with multiple-choice question answering (MCQA) prompts. The routing decision is guided by the prediction uncertainty estimates derived via Conformal Prediction (CP), which provides rigorous coverage guarantees. To improve uncertainty differentiation across inputs, we introduce Full and Binary Entropy (FBE), a novel entropy-based criterion that adaptively selects the appropriate CP threshold. Experiments across MCQA and QA benchmarks—including mathematics, logical reasoning, and Chinese chemistry—demonstrate that CP-Router efficiently reduces token usage while maintaining or even improving accuracy compared to using LRM alone. We further demonstrate the generality and robustness of CP-Router by extending it to diverse model pairings beyond the LLM–LRM setting.

CP-Router: An Uncertainty-Aware Router Between LLM and LRM

Configuring the parameters of additive manufacturing processes for metal alloys is a challenging problem due to complex relationships between input parameters (e.g., laser power, scan speed, and material feed rate) and quality of printed outputs. The standard trial-and-error approach to find feasible parameter configurations is highly inefficient because validating each input configuration is expensive in terms of resources (physical and human labor) and the configuration space is very large. This paper applies the general principle of AI-driven adaptive experimental design for optimization to the more challenging problem of discovering feasible configurations. The key idea is to build a probabilistic surrogate model from past experiments to intelligently select a small batch of input configurations for validation in each iteration. To demonstrate the effectiveness of this methodology, we deploy it for Directed Energy Deposition (DED) process to print GRCop-42, a high-performance copper–chromium–niobium alloy developed by NASA for extreme-temperature aerospace applications. Within weeks, our approach yielded multiple defect-free outputs across a range of laser powers—dramatically reducing time-to-result and resource expenditure compared to four months of manual experimentation by our collaborators with little to no success. By enabling high-quality GRCop-42 fabrication on readily available infrared laser platforms for the first-time, we democratize access to this critical alloy, paving the way for cost-effective, decentralized production of rocket engine chambers, heat exchangers, and other high-heat-flux components.

Premium content

Downloads

Next from AAAI 2026

Explainable Oracle Bone Script Recognition via Multimodal Pictographic Reasoning

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES