Singapore

Slum segmentation from satellite imagery holds significant promise in generating consistent global estimates of urban poverty. However, the morphological heterogeneity of informal settlements presents a major challenge, limiting the generalization of models trained on specific regions to unseen locations. To address this, we introduce a large-scale high-resolution dataset and propose GRAM (Generalized Region-Aware Mixture-of-Experts), a two-phase test-time adaptation framework that enables robust slum segmentation without labeled data from target regions. We compile a million-scale dataset of preprocessed satellite imagery from 12 cities across four continents for source training. Using this data set, GRAM employs a mixture-of-experts architecture to capture region-specific slum characteristics while learning universal features through a shared backbone. During adaptation, prediction consistency across experts filters unreliable pseudo-labels, allowing the model to generalize effectively to previously unseen regions. When tested on three African cities, GRAM outperforms state-of-the-art baselines in low-resource settings, offering a scalable and label-efficient solution for global slum mapping and data-driven urban planning.

AAAI 2026

Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

This paper presents an eXplainable AI (XAI)-based class-
room game “Breakable Machine” for teaching critical, trans-
formative AI literacy through adversarial play and interroga-
tion of AI systems. Designed for learners aged 10–15, the
game invites students to spoof an image classifier by manip-
ulating their appearance or environment in order to trigger
high-confidence misclassifications. Rather than focusing on
building AI models, this activity centers on breaking them—
exposing their brittleness, bias, and vulnerability through
hands-on, embodied experimentation. The game includes an
XAI view to help students visualize feature saliency, reveal-
ing how models attend to specific visual cues. A shared class-
room leaderboard fosters collaborative inquiry and compar-
ison of strategies, turning the classroom into a site for col-
lective sensemaking. This approach repositions AI education
by treating model failure and misclassification not as prob-
lems to be debugged, but as pedagogically rich opportuni-
ties to interrogate AI as a sociotechnical system. In doing so,
the game supports students in developing data agency, ethical
awareness, and a critical stance toward AI systems increas-
ingly embedded in everyday life. The game and its source
code are freely available.

Breakable Machine: A K–12 Classroom Game for Transformative AI Literacy Through Spoofing and eXplainable AI (XAI)

Dataset distillation creates a small distilled set that enables efficient training by capturing key information from the full dataset. While existing dataset distillation methods perform well on balanced datasets, they struggle under long-tailed distributions, where imbalanced class frequencies induce biased model representations and corrupt statistical estimates such as Batch Normalization (BN) statistics. In this paper, we rethink long-tailed dataset distillation by revisiting the limitations of trajectory-based methods, and instead adopt the statistical alignment perspective to jointly mitigate model bias and restore fair supervision. To this end, we introduce three dedicated components that enable unbiased recovery of distilled images and soft relabeling: (1) enhancing expert models (an observer model for recovery and a teacher model for relabeling) to enable reliable statistics estimation and soft-label generation; (2) recalibrating BN statistics via a full forward pass with dynamically adjusted momentum to reduce representation skew; (3) initializing synthetic images by incrementally selecting high-confidence and diverse augmentations via a multi-round mechanism that promotes coverage and diversity. Extensive experiments on four long-tailed benchmarks show consistent improvements over state-of-the-art methods across varying degrees of class imbalance.Notably, our approach improves top-1 accuracy by 15.6% on CIFAR-100-LT and 11.8% on Tiny-ImageNet-LT under IPC=10 and IF=10.

Rethinking Long-tailed Dataset Distillation: A Uni-Level Framework with Unbiased Recovery and Relabeling

3D Gaussian Splatting (3DGS) achieves impressive rendering fidelity and speed for novel view synthesis. However, its substantial data size poses a significant challenge for practical applications. While many compression techniques have been proposed, they fail to efficiently utilize existing bitstreams in on-demand applications due to their lack of progressivity, leading to a waste of resource.
To address this issue, we propose __PCGS__ (Progressive Compression of 3D Gaussian Splatting), which adaptively controls __both the quantity and quality__ of Gaussians (or anchors) to enable effective progressivity for on-demand applications. For quantity, we introduce a progressive masking strategy that incrementally incorporates new anchors while refining existing ones to enhance fidelity. For quality, we propose a progressive quantization approach that gradually reduces quantization step sizes to achieve finer modeling of Gaussian attributes. Furthermore, to compact the incremental bitstreams, we leverage existing quantization results to refine probability prediction, improving entropy coding efficiency across progressive levels.
Overall, PCGS achieves progressivity while maintaining compression performance comparable to SoTA non-progressive methods. Our code will be made publicly available.

PCGS: Progressive Compression of 3D Gaussian Splatting

Large Language Models (LLMs) are increasingly used as scalable tools for pilot testing, predicting public opinion distributions before deploying costly surveys. However, the prevailing paradigm for evaluating these models relies on traditional structured surveys—a methodology misaligned with the more realistic scenarios like social media where opinions are rich in digital contexts. By design, surveys strip away the social and cultural context that shapes public opinion, and LLM benchmarks built on this paradigm inherit these critical limitations. To bridge this gap, we introduce MindVote, the first benchmark for public opinion prediction grounded in authentic social media discourse. MindVote is constructed from 3,918 naturalistic polls sourced from Reddit and Weibo, spanning 23 topics and enriched with detailed annotations for platform and topical context. Using this benchmark, we conduct a comprehensive evaluation of 15 LLMs, revealing a critical "survey-based specialization pitfall" where models fine-tuned on traditional surveys underperform their general-purpose counterparts and demonstrating the necessity of context in social media. MindVote provides a robust, ecologically valid framework to move beyond survey-based evaluations and advance the development of social intelligent AI systems.

MindVote: When AI Meets the Wild West of Social Media Opinion

Multi-view diabetic retinopathy (DR) grading has achieved remarkable performance by capturing more comprehensive pathological features than single-view methods. However, complete multi-view fundus images are often difficult to obtain in clinical practice, and the performance degrades significantly when fewer views are available. To overcome this limitation, we propose the first incomplete multi-view DR grading framework, aiming to provide accurate diagnosis regardless of the number of available views. It introduces two novel modules. First, cross-view spatial correlation attention (CSCA) captures region correlations across views, automatically identifying and fusing diagnostically relevant spatial features to improve feature representation. Second, self-supervised mask consistency learning (SMCL) formulates a novel pretext task of missing-view information reconstruction by strategically masking inter- and intra-view regions, enabling the model to infer complete features from incomplete views. Benefiting from CSCA and SMCL, our method enhances structural feature consistency across views and effectively compensates for missing information during DR grading. Extensive experiments demonstrate that our method achieves state-of-the-art grading performance, particularly under realistic conditions where some views are unavailable.

Incomplete Multi-view Diabetic Retinopathy Grading via Self-Supervised Inter- and Intra-View Restoration

Graph Domain Adaptation (GDA) facilitates knowledge transfer from labeled source graphs to unlabeled target graphs by learning domain-invariant representations, which is essential in applications such as molecular property prediction and social network analysis. However, most existing GDA methods rely on the assumption of clean source labels, which rarely holds in real-world scenarios where annotation noise is pervasive. This label noise severely impairs feature alignment and degrades adaptation performance under domain shifts. To address this challenge, we propose Nested Graph Pseudo-Label Refinement (NeGPR), a novel framework tailored for graph-level domain adaptation with noisy labels. NeGPR first pretrains dual branches, i.e., semantic and topology branches, by enforcing neighborhood consistency in the feature space, thereby reducing the influence of noisy supervision. To bridge domain gaps, NeGPR employs a nested refinement mechanism in which one branch selects high-confidence target samples to guide the adaptation of the other, enabling progressive cross-domain learning. Furthermore, since pseudo-labels may still contain noise and the pre-trained branches are already overfitted to the noisy labels in the source domain, NeGPR incorporates a noise-aware regularization strategy. This regularization is theoretically proven to mitigate the adverse effects of pseudo-label noise, even under the presence of source overfitting, thus enhancing the robustness of the adaptation process. Extensive experiments on benchmark datasets demonstrate that NeGPR consistently outperforms state-of-the-art methods under severe label noise.

Nested Graph Pseudo-Label Refinement for Noisy Label Domain Adaptation Learning

Single-cell RNA sequencing (scRNA-seq), especially temporally resolved datasets, enables genome-wide profiling of gene expression dynamics at single-cell resolution across discrete time points. However, current technologies provide only sparse, static snapshots of cell states and are inherently influenced by technical noise, complicating the inference and representation of continuous transcriptional dynamics. Although embedding methods can reduce dimensionality and mitigate technical noise, the majority of existing approaches typically treat trajectory inference separately from embedding construction, often neglecting temporal structure. To address this challenge, here we introduce CellStream, a novel deep learning framework that jointly learns embedding and cellular dynamics from single-cell snapshot data by integrating an autoencoder with unbalanced dynamical optimal transport. Compared to existing methods, CellStream generates dynamics-informed embeddings that robustly capture temporal developmental processes while maintaining high consistency with the underlying data manifold. We demonstrate CellStream’s effectiveness on both simulated datasets and real scRNA-seq data, including spatial transcriptomics. Our experiments indicate significant quantitative improvements over state-of-the-art methods in representing cellular trajectories with enhanced temporal coherence and reduced noise sensitivity. Overall, CellStream provides a new tool for learning and representing continuous streams from the noisy, static snapshots of single-cell gene expression.

CellStream: Dynamical Optimal Transport Informed Embeddings for Reconstructing Cellular Trajectories from Snapshots Data

Autoformalization aims to translate natural-language mathematical statements into a formal language. While LLMs have accelerated progress in this area, existing methods still suffer from low accuracy. We identify two key abilities for effective autoformalization: comprehensive mastery of formal-language domain knowledge, and reasoning capability of natural language problem understanding and informal-formal alignment. Without the former, a model cannot identify the correct formal objects; without the latter, it struggles to interpret real-world contexts and map them precisely into formal expressions. To address these gaps, we introduce ThinkingF, a data synthesis and training pipeline that improves both abilities. First, we construct two datasets: one by distilling and selecting large-scale examples rich in formal knowledge, and another by generating informal-to-formal reasoning trajectories guided by expert-designed templates. We then apply SFT and RLVR with these datasets to further fuse and refine the two abilities. The resulting 7B and 32B models exhibit both comprehensive formal knowledge and strong informal-to-formal reasoning. Notably, StepFun-Formalizer-32B achieves SOTA BEq@1 scores of 40.5% on FormalMATH-Lite and 26.7% on ProverBench, surpassing all prior general-purpose and specialized models.

StepFun-Formalizer: Unlocking the Autoformalization Potential of LLMs Through Knowledge-Reasoning Fusion

We formalize AI alignment as a multi-objective optimization problem called $\langle M,N,\varepsilon,\delta\rangle$-agreement that generalizes prior approaches with fewer assumptions, in which a set of $N$ agents (including humans) must reach approximate ($\varepsilon$) agreement across $M$ candidate objectives with probability at least $1-\delta$. 
Using communication complexity, we prove an information-theoretic lower bound demonstrating that once either $M$ or $N$ is large enough, no interaction or rationality can avoid intrinsic alignment overheads. 
This barrier establishes rigorous intrinsic limits to alignment \emph{itself}, not merely to specific methods, clarifying a crucial "no free lunch" principle: encoding "all human values" inevitably leads to misalignment, requiring future methods to explicitly manage complexity through consensus-driven reduction or prioritization of objectives. 
Complementing this impossibility result, we provide explicit algorithms achieving alignment under both computationally unbounded and bounded rationality with noisy messages. 
Even in these best-case scenarios where alignment to arbitrary precision is theoretically guaranteed, our analysis identifies three critical scalability barriers: the number of tasks ($M$), agents ($N$), and task state space size ($D$); thereby highlighting fundamental complexity-theoretic constraints and providing guidelines for safer, scalable human–AI collaboration.

Intrinsic Barriers and Practical Pathways for Human–AI Alignment: An Agreement-Based Complexity Analysis

Prompt Tuning (PT) is a widely used strategy for adapting pre-trained Vision-Language Models (VLMs) to various downstream tasks. Conventional PT methods evaluate performance separately on known (base) and unknown (new) classes. However, in real-world scenarios, models often encounter inputs without prior knowledge of their class domain. This challenge has motivated the development of Open-world Prompt Tuning (OPT), which requires models to first determine whether a sample belongs to base or new classes and then classify it accordingly. In this work, we carefully review existing OPT methods and identify three key limitations: (L1) incomplete evaluation metrics, (L2) time-consuming and memory-intensive OOD detection methods, and (L3) insufficiently comprehensive optimization strategies. To address these issues, we first tackle L1 by proposing two novel metrics to explicitly evaluate adaptability and generalization under the OPT setting, forming a more comprehensive evaluation framework. For L2, we propose a training-free OOD detection method called Entropy-weighted Rank-normalized Fusion (ERF), which first applies rank normalization to both the maximum and the sum of base-class probabilities, followed by an entropy-weighted fusion of the normalized values. For L3, we propose a plug-and-play Gated Dual-Merging (GDM) strategy to strengthen the classifier’s capability. GDM performs selective merging at the weight level based on an adaptive criterion and combines fine-tuned and LLM-boosted logits at the output level. Extensive experiments on three PT baselines across 11 datasets demonstrate the effectiveness of our proposed ERF and GDM.

Downloads

Next from AAAI 2026

Breakable Machine: A K–12 Classroom Game for Transformative AI Literacy Through Spoofing and eXplainable AI (XAI)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES