Singapore

We train Transformer-based language models on ten foundational algorithmic tasks and observe pronounced phase transitions in their loss curves that deviate from established power-law scaling trends. Over large ranges of compute, the validation loss barely improves, then abruptly decreases. Probing the models’ internal representations reveals that quiet features are learned prior to any decrease in task loss. These quiet features represent intermediate algorithmic computations that do not by themselves improve the output loss. Ablation experiments demonstrate that individual quiet features are causally necessary for task performance. Our results demonstrate that substantial representational progress can remain hidden beneath an apparently flat loss curve, challenging the prevailing use of cross‑entropy as a proxy for learning and motivating richer diagnostics for monitoring model training.

AAAI 2026

Quiet Feature Learning in Algorithmic Tasks

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Large language models (LLMs) are transforming the field of
natural language processing, yet their development remains
concentrated on a handful of high-resource languages,
raising fundamental questions of inclusivity, trust, and
global accessibility. My research addresses these
challenges by advancing multilingual and trustworthy AI. On
the multilingual front, I have analyzed how LLMs internally
process diverse languages, introduced benchmarks such as
M3Exam and SeaBench to reveal performance gaps, and led
large-scale open-source initiatives including SeaLLMs and
Babel that extend strong model support to underrepresented
languages worldwide. Complementing inclusivity, my work
also uncovers vulnerabilities in LLMs (e.g., multilingual
jailbreaks) and introduces neuron-level interpretability
and automated evaluation frameworks (e.g., Auto-Arena) for
trustworthy deployment. Looking ahead, I aim to build AI
systems that are linguistically inclusive, culturally
aware, and inherently safe, bridging foundational advances
with real-world applications in diverse global contexts.

Towards Inclusive AI: Advancing Multilingual Large Language Models

Slum segmentation from satellite imagery holds significant promise in generating consistent global estimates of urban poverty. However, the morphological heterogeneity of informal settlements presents a major challenge, limiting the generalization of models trained on specific regions to unseen locations. To address this, we introduce a large-scale high-resolution dataset and propose GRAM (Generalized Region-Aware Mixture-of-Experts), a two-phase test-time adaptation framework that enables robust slum segmentation without labeled data from target regions. We compile a million-scale dataset of preprocessed satellite imagery from 12 cities across four continents for source training. Using this data set, GRAM employs a mixture-of-experts architecture to capture region-specific slum characteristics while learning universal features through a shared backbone. During adaptation, prediction consistency across experts filters unreliable pseudo-labels, allowing the model to generalize effectively to previously unseen regions. When tested on three African cities, GRAM outperforms state-of-the-art baselines in low-resource settings, offering a scalable and label-efficient solution for global slum mapping and data-driven urban planning.

Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts

This paper presents an eXplainable AI (XAI)-based class-
room game “Breakable Machine” for teaching critical, trans-
formative AI literacy through adversarial play and interroga-
tion of AI systems. Designed for learners aged 10–15, the
game invites students to spoof an image classifier by manip-
ulating their appearance or environment in order to trigger
high-confidence misclassifications. Rather than focusing on
building AI models, this activity centers on breaking them—
exposing their brittleness, bias, and vulnerability through
hands-on, embodied experimentation. The game includes an
XAI view to help students visualize feature saliency, reveal-
ing how models attend to specific visual cues. A shared class-
room leaderboard fosters collaborative inquiry and compar-
ison of strategies, turning the classroom into a site for col-
lective sensemaking. This approach repositions AI education
by treating model failure and misclassification not as prob-
lems to be debugged, but as pedagogically rich opportuni-
ties to interrogate AI as a sociotechnical system. In doing so,
the game supports students in developing data agency, ethical
awareness, and a critical stance toward AI systems increas-
ingly embedded in everyday life. The game and its source
code are freely available.

Breakable Machine: A K–12 Classroom Game for Transformative AI Literacy Through Spoofing and eXplainable AI (XAI)

Dataset distillation creates a small distilled set that enables efficient training by capturing key information from the full dataset. While existing dataset distillation methods perform well on balanced datasets, they struggle under long-tailed distributions, where imbalanced class frequencies induce biased model representations and corrupt statistical estimates such as Batch Normalization (BN) statistics. In this paper, we rethink long-tailed dataset distillation by revisiting the limitations of trajectory-based methods, and instead adopt the statistical alignment perspective to jointly mitigate model bias and restore fair supervision. To this end, we introduce three dedicated components that enable unbiased recovery of distilled images and soft relabeling: (1) enhancing expert models (an observer model for recovery and a teacher model for relabeling) to enable reliable statistics estimation and soft-label generation; (2) recalibrating BN statistics via a full forward pass with dynamically adjusted momentum to reduce representation skew; (3) initializing synthetic images by incrementally selecting high-confidence and diverse augmentations via a multi-round mechanism that promotes coverage and diversity. Extensive experiments on four long-tailed benchmarks show consistent improvements over state-of-the-art methods across varying degrees of class imbalance.Notably, our approach improves top-1 accuracy by 15.6% on CIFAR-100-LT and 11.8% on Tiny-ImageNet-LT under IPC=10 and IF=10.

Rethinking Long-tailed Dataset Distillation: A Uni-Level Framework with Unbiased Recovery and Relabeling

3D Gaussian Splatting (3DGS) achieves impressive rendering fidelity and speed for novel view synthesis. However, its substantial data size poses a significant challenge for practical applications. While many compression techniques have been proposed, they fail to efficiently utilize existing bitstreams in on-demand applications due to their lack of progressivity, leading to a waste of resource.
To address this issue, we propose __PCGS__ (Progressive Compression of 3D Gaussian Splatting), which adaptively controls __both the quantity and quality__ of Gaussians (or anchors) to enable effective progressivity for on-demand applications. For quantity, we introduce a progressive masking strategy that incrementally incorporates new anchors while refining existing ones to enhance fidelity. For quality, we propose a progressive quantization approach that gradually reduces quantization step sizes to achieve finer modeling of Gaussian attributes. Furthermore, to compact the incremental bitstreams, we leverage existing quantization results to refine probability prediction, improving entropy coding efficiency across progressive levels.
Overall, PCGS achieves progressivity while maintaining compression performance comparable to SoTA non-progressive methods. Our code will be made publicly available.

PCGS: Progressive Compression of 3D Gaussian Splatting

Large Language Models (LLMs) are increasingly used as scalable tools for pilot testing, predicting public opinion distributions before deploying costly surveys. However, the prevailing paradigm for evaluating these models relies on traditional structured surveys—a methodology misaligned with the more realistic scenarios like social media where opinions are rich in digital contexts. By design, surveys strip away the social and cultural context that shapes public opinion, and LLM benchmarks built on this paradigm inherit these critical limitations. To bridge this gap, we introduce MindVote, the first benchmark for public opinion prediction grounded in authentic social media discourse. MindVote is constructed from 3,918 naturalistic polls sourced from Reddit and Weibo, spanning 23 topics and enriched with detailed annotations for platform and topical context. Using this benchmark, we conduct a comprehensive evaluation of 15 LLMs, revealing a critical "survey-based specialization pitfall" where models fine-tuned on traditional surveys underperform their general-purpose counterparts and demonstrating the necessity of context in social media. MindVote provides a robust, ecologically valid framework to move beyond survey-based evaluations and advance the development of social intelligent AI systems.

MindVote: When AI Meets the Wild West of Social Media Opinion

Multi-view diabetic retinopathy (DR) grading has achieved remarkable performance by capturing more comprehensive pathological features than single-view methods. However, complete multi-view fundus images are often difficult to obtain in clinical practice, and the performance degrades significantly when fewer views are available. To overcome this limitation, we propose the first incomplete multi-view DR grading framework, aiming to provide accurate diagnosis regardless of the number of available views. It introduces two novel modules. First, cross-view spatial correlation attention (CSCA) captures region correlations across views, automatically identifying and fusing diagnostically relevant spatial features to improve feature representation. Second, self-supervised mask consistency learning (SMCL) formulates a novel pretext task of missing-view information reconstruction by strategically masking inter- and intra-view regions, enabling the model to infer complete features from incomplete views. Benefiting from CSCA and SMCL, our method enhances structural feature consistency across views and effectively compensates for missing information during DR grading. Extensive experiments demonstrate that our method achieves state-of-the-art grading performance, particularly under realistic conditions where some views are unavailable.

Incomplete Multi-view Diabetic Retinopathy Grading via Self-Supervised Inter- and Intra-View Restoration

Graph Domain Adaptation (GDA) facilitates knowledge transfer from labeled source graphs to unlabeled target graphs by learning domain-invariant representations, which is essential in applications such as molecular property prediction and social network analysis. However, most existing GDA methods rely on the assumption of clean source labels, which rarely holds in real-world scenarios where annotation noise is pervasive. This label noise severely impairs feature alignment and degrades adaptation performance under domain shifts. To address this challenge, we propose Nested Graph Pseudo-Label Refinement (NeGPR), a novel framework tailored for graph-level domain adaptation with noisy labels. NeGPR first pretrains dual branches, i.e., semantic and topology branches, by enforcing neighborhood consistency in the feature space, thereby reducing the influence of noisy supervision. To bridge domain gaps, NeGPR employs a nested refinement mechanism in which one branch selects high-confidence target samples to guide the adaptation of the other, enabling progressive cross-domain learning. Furthermore, since pseudo-labels may still contain noise and the pre-trained branches are already overfitted to the noisy labels in the source domain, NeGPR incorporates a noise-aware regularization strategy. This regularization is theoretically proven to mitigate the adverse effects of pseudo-label noise, even under the presence of source overfitting, thus enhancing the robustness of the adaptation process. Extensive experiments on benchmark datasets demonstrate that NeGPR consistently outperforms state-of-the-art methods under severe label noise.

Nested Graph Pseudo-Label Refinement for Noisy Label Domain Adaptation Learning

Single-cell RNA sequencing (scRNA-seq), especially temporally resolved datasets, enables genome-wide profiling of gene expression dynamics at single-cell resolution across discrete time points. However, current technologies provide only sparse, static snapshots of cell states and are inherently influenced by technical noise, complicating the inference and representation of continuous transcriptional dynamics. Although embedding methods can reduce dimensionality and mitigate technical noise, the majority of existing approaches typically treat trajectory inference separately from embedding construction, often neglecting temporal structure. To address this challenge, here we introduce CellStream, a novel deep learning framework that jointly learns embedding and cellular dynamics from single-cell snapshot data by integrating an autoencoder with unbalanced dynamical optimal transport. Compared to existing methods, CellStream generates dynamics-informed embeddings that robustly capture temporal developmental processes while maintaining high consistency with the underlying data manifold. We demonstrate CellStream’s effectiveness on both simulated datasets and real scRNA-seq data, including spatial transcriptomics. Our experiments indicate significant quantitative improvements over state-of-the-art methods in representing cellular trajectories with enhanced temporal coherence and reduced noise sensitivity. Overall, CellStream provides a new tool for learning and representing continuous streams from the noisy, static snapshots of single-cell gene expression.

CellStream: Dynamical Optimal Transport Informed Embeddings for Reconstructing Cellular Trajectories from Snapshots Data

Autoformalization aims to translate natural-language mathematical statements into a formal language. While LLMs have accelerated progress in this area, existing methods still suffer from low accuracy. We identify two key abilities for effective autoformalization: comprehensive mastery of formal-language domain knowledge, and reasoning capability of natural language problem understanding and informal-formal alignment. Without the former, a model cannot identify the correct formal objects; without the latter, it struggles to interpret real-world contexts and map them precisely into formal expressions. To address these gaps, we introduce ThinkingF, a data synthesis and training pipeline that improves both abilities. First, we construct two datasets: one by distilling and selecting large-scale examples rich in formal knowledge, and another by generating informal-to-formal reasoning trajectories guided by expert-designed templates. We then apply SFT and RLVR with these datasets to further fuse and refine the two abilities. The resulting 7B and 32B models exhibit both comprehensive formal knowledge and strong informal-to-formal reasoning. Notably, StepFun-Formalizer-32B achieves SOTA BEq@1 scores of 40.5% on FormalMATH-Lite and 26.7% on ProverBench, surpassing all prior general-purpose and specialized models.

Downloads

Next from AAAI 2026

Towards Inclusive AI: Advancing Multilingual Large Language Models

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES