Singapore

Machine learning models trained on Earth observation data, such as satellite imagery, have demonstrated significant promise in predicting household-level wealth indices, enabling the creation of high-resolution wealth maps that can be leveraged across multiple causal trials while addressing chronic data scarcity in global development research. However, because standard training objectives prioritize overall predictive accuracy, these predictions inherently suffer from shrinkage toward the mean, leading to attenuated estimates of causal treatment effects and limiting their utility in policy evaluations. Existing debiasing methods, such as Prediction-Powered Inference (PPI), can handle this attenuation bias but require additional fresh ground-truth data at the downstream stage of causal inference, which restricts their applicability in data-scarce environments. In this paper, we introduce and evaluate two correction methods—linear calibration correction and Tweedie&#39;s correction—that substantially reduce prediction bias without relying on newly collected labeled data. Linear calibration (LCC) corrects bias through a straightforward linear transformation derived from held-out calibration data, whereas Tweedie&#39;s correction leverages empirical Bayes principles to directly address shrinkage-induced biases by exploiting score functions derived from evaluating the model&#39;s learning patterns. Through analytical exercises and experiments using Demographic and Health Survey (DHS) data, we demonstrate that both proposed methods meet or outperform existing approaches that either require (a) adjustments to training pipelines or (b) additional labeled data, achieving significant reductions in attenuation bias in data-scarce environments. These approaches may represent a promising avenue for improving the reliability of causal inference when direct outcome measures are limited or unavailable, enabling a ``one map, many trials&#39;&#39; paradigm where a single upstream data creation team produces predictions usable by many downstream teams across diverse ML pipelines.

AAAI 2026

Debiasing Machine Learning Predictions for Causal Inference Without Additional Ground Truth Data: “One Map, Many Trials” in Satellite-Driven Poverty Analysis

poverty mapping

data-scarce settings

attenuation bias

causal inference

bias mitigation

Machine learning models trained on Earth observation data, such as satellite imagery, have demonstrated significant promise in predicting household-level wealth indices, enabling the creation of high-resolution wealth maps that can be leveraged across multiple causal trials while addressing chronic data scarcity in global development research. However, because standard training objectives prioritize overall predictive accuracy, these predictions inherently suffer from shrinkage toward the mean, leading to attenuated estimates of causal treatment effects and limiting their utility in policy evaluations. Existing debiasing methods, such as Prediction-Powered Inference (PPI), can handle this attenuation bias but require additional fresh ground-truth data at the downstream stage of causal inference, which restricts their applicability in data-scarce environments. In this paper, we introduce and evaluate two correction methods—linear calibration correction and Tweedie's correction—that substantially reduce prediction bias without relying on newly collected labeled data. Linear calibration (LCC) corrects bias through a straightforward linear transformation derived from held-out calibration data, whereas Tweedie's correction leverages empirical Bayes principles to directly address shrinkage-induced biases by exploiting score functions derived from evaluating the model's learning patterns. Through analytical exercises and experiments using Demographic and Health Survey (DHS) data, we demonstrate that both proposed methods meet or outperform existing approaches that either require (a) adjustments to training pipelines or (b) additional labeled data, achieving significant reductions in attenuation bias in data-scarce environments. These approaches may represent a promising avenue for improving the reliability of causal inference when direct outcome measures are limited or unavailable, enabling a ``one map, many trials'' paradigm where a single upstream data creation team produces predictions usable by many downstream teams across diverse ML pipelines.

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Recent learning-based methods have reduced the computational complexity of traditional trajectory similarity computation, but state-of-the-art (SOTA) methods still fail to leverage the comprehensive spectrum of trajectory information for similarity modeling. To tackle this problem, we propose \textbf{RePo}, a novel method that jointly encodes \textbf{Re}gion-wise and \textbf{Po}int-wise features to capture both spatial context and fine-grained moving patterns. For region-wise representation, the GPS trajectories are first mapped to grid sequences, and spatial context are captured by structural features and semantic context enriched by visual features. For point-wise representation, three lightweight expert networks extract local, correlation, and continuous movement patterns from dense GPS sequences. Then, a router network adaptively fuses the learned point-wise features, which are subsequently combined with region-wise features using cross-attention to produce the final trajectory embedding. To train RePo, we adopt a contrastive loss with hard negative samples to provide similarity ranking supervision. Experiment results show that RePo achieves an average accuracy improvement of 22.2\% over SOTA baselines across all evaluation metrics.

Region-Point Joint Representation for Effective Trajectory Similarity Learning

Multimodal Large Language Models struggle to maintain reliable performance under extreme real-world visual degradations, which impede their practical robustness. Current robustness enhancement methods rely on implicit training/adaptation that focuses solely on visual encoder generalization, suffering from limited interpretability and isolated optimization. To overcome these limitations, we propose Robust-R1, a novel framework that explicitly models visual degradations through structured reasoning chains. Our approach integrates: (i) supervised fine-tuning for degradation-aware reasoning foundations, (ii) reward-driven alignment for accurately perceiving degradation parameters, and (iii) dynamic reasoning depth scaling adapted to degradation intensity. To support this methodology, we introduce a novel 11K dataset featuring realistic degradations synthesized across four critical real-world visual processing stages, each annotated with structured chains connecting degradation parameters, perceptual effects, and pristine semantic reasoning. Comprehensive evaluations demonstrate state-of-the-art robustness: Robust-R1 outperforms all general and robust baselines on the real-world degradation benchmark R-Bench, while maintaining superior anti-degradation performance under multi-intensity adversarial degradations on MMBench, MMStar, and RealWorldQA.
We will release our code, demo, and dataset soon.

Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

Recently Multimodal Large Language Models (MLLMs) have achieved considerable advancements in vision-language tasks, yet produce potentially harmful or untrustworthy content. Despite substantial work investigating the trustworthiness of language models, MMLMs' capability to act honestly, especially when faced with visually unanswerable questions, remains largely underexplored. This work presents the first systematic assessment of honesty behaviors across various MLLMs. We ground honesty in models' response behaviors to unanswerable visual questions, define four representative types of such questions, and construct MoHoBench, a large-scale MMLM honest benchmark, consisting of 12k+ visual question samples, whose quality is guaranteed by multi-stage filtering and human verification. Using MoHoBench, we benchmarked the honesty of 28 popular MMLMs and conducted a comprehensive analysis. Our findings show that: (1) most models fail to appropriately refuse to answer when necessary, and (2) MMLMs' honesty is not solely a language modeling issue, but is deeply influenced by visual information, necessitating the development of dedicated methods for multimodal honesty alignment. Therefore, we implemented initial alignment methods using supervised and preference learning to improve honesty behavior, providing a foundation for future work on trustworthy MLLMs.

MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions

Point-of-Interest (POI) recommendation plays a pivotal role in location-based services by guiding users to discover new and relevant places. While graph-based methods have shown promising results, effectively modeling the diversity and dynamics of user preferences remains a key challenge. Addressing this requires richer representations of both POIs and user interests, as well as more adaptive learning strategies.
In this work, we propose TMHKG, a Task-aware Meta-learning framework with a Heterogeneous Knowledge Graph for POI recommendation. To enhance representation learning, TMHKG constructs a dual-view POI knowledge graph that integrates geographical proximity and user-aware category transitions, and models users' evolving interests from sequential visit histories. On top of enriched features, TMHKG adopts a task-aware meta-learning paradigm, treating each user's recommendation task as a separate meta-task. A generalizable recommendation policy is first learned from diverse training tasks and then quickly adapted to each user's unique behavior, enabling highly personalized predictions.
Extensive experiments on two real-world datasets demonstrate that TMHKG consistently outperforms state-of-the-art baselines, highlighting its effectiveness in capturing complex user-POI interactions.

Task-Aware Meta-Learning on Heterogeneous Knowledge Graph for POI Recommendation

Pre-trained Vision-Language Models (VLMs), e.g. CLIP,
have become essential tools in multimodal transfer learn-
ing. However, fine-tuning VLMs in few-shot scenarios poses
significant challenges in balancing task-specific adaptation
and generalization in the obtained model. Meanwhile, cur-
rent researches have predominantly focused on prompt-based
adaptation methods, leaving adapter-based approaches un-
derexplored and revealing notable performance gaps. To ad-
dress these challenges, we introduce a novel Reconstruction-
based Multimodal Adapter (RMAdapter), which leverages a
dual-branch architecture. Unlike conventional single-branch
adapters, RMAdapter consists of: (1) an adaptation branch
that injects task-specific knowledge through parameter-
efficient fine-tuning, and (2) a reconstruction branch that pre-
serves general knowledge by reconstructing latent space fea-
tures back into the original feature space. This design facil-
itates a dynamic balance between general and task-specific
knowledge. Importantly, although RMAdapter introduces an
additional reconstruction branch, it is carefully optimized
to remain lightweight. By computing reconstruction loss lo-
cally at each layer and sharing projection modules, the over-
all computational overhead is kept minimal. A consistency
constraint is also incorporated to better regulate the trade-
off between discriminability and generalization. We compre-
hensively evaluate the effectiveness of RMAdapter on three
representative tasks: generalization to new categories, gen-
eralization to new target datasets, and domain generalization.
Without relying on data augmentation or duplicate prompt de-
signs, our RMAdapter consistently outperforms state-of-the-
art approaches across all evaluation metrics.

RMAdapter: Reconstruction-based Multi-Modal Adapter for Vision-Language Models

Efficiently and accurately determining the symmetry is a crucial step in the structural analysis of crystalline materials. Existing methods usually mindlessly apply deep learning models while ignoring the underlying chemical rules. More importantly, experiments show that they face a serious sub-property confusion SPC problem. To address the above challenges, from a decoupled perspective, we introduce the XRDecoupler framework, a problem-solving arsenal specifically designed to tackle the SPC problem. Imitating the thinking process of chemists, we innovatively incorporate multidimensional crystal symmetry information as superclass guidance to ensure that the model's prediction process aligns with chemical intuition. We further design a hierarchical PXRD pattern learning model and a multi-objective optimization approach to achieve high-quality representation and balanced optimization. Comprehensive evaluations on three mainstream databases (e.g., CCDC, CoREMOF, and InorganicData) demonstrate that XRDecoupler excels in performance, interpretability, and generalization. The code for our method is available in Supplement.

Rethinking Crystal Symmetry Prediction: A Decoupled Perspective

Object 6D pose estimation is a challenging task that is crucial for robotics and augmented reality applications, particularly when dealing with novel objects. A promising direction is single-reference-based estimation, which requires only a single annotated view instead of a full 3D model. However, existing methods rely on dense correspondence regression, which suffers from limited global consistency due to the local nature of convolutional architectures, and faces challenges in symmetric or occluded scenarios due to deterministic predictions.
We present CoordAR, a novel autoregressive framework for single-reference 6D pose estimation of unseen objects. CoordAR formulates 3D-3D correspondences between the reference and query views as a discretized coordinate map, which is decoded autoregressively in a probabilistic manner. To enable accurate correspondence regression, CoordAR introduces: 1) a novel coordinate map tokenization enabling probabilistic prediction over discretized 3D space; 2) a decoupled encoding strategy that separately encodes RGB appearance and coordinate cues; and 3) an autoregressive transformer decoder conditioned on both pixel-aligned query features and the partially generated coordinate sequence.
Thanks to the novel designs, CoordAR significantly outperforms existing methods on multiple benchmarks and demonstrates strong robustness to symmetry, occlusion, and other challenges in real-world tests, while requiring only a single reference view.

CoordAR: One-Reference 6D Pose Estimation of Novel Objects via Autoregressive Coordinate Map Generation

Modular design of planning-oriented autonomous driving has markedly advanced end-to-end systems. However, existing architectures remain constrained by an over-reliance on ego status, hindering generalization and robust scene understanding. We identify the root cause as an inherent design within these architectures that allows ego status to be easily leveraged as a shortcut. Specifically, the premature fusion of ego status in the upstream BEV encoder allows an information flow from this strong prior to dominate the downstream planning module. To address this challenge, we propose AdaptiveAD, an architectural-level solution based on a multi-context fusion strategy. Its core is a dual-branch structure that explicitly decouples scene perception and ego status. One branch performs scene-driven reasoning based on multi-task learning, but with ego status deliberately omitted from the BEV encoder, while the other conducts ego-driven reasoning based solely on the planning task. A scene-aware fusion module then adaptively integrates the complementary decisions from the two branches to form the final planning trajectory. To ensure this decoupling does not compromise multi-task learning, we introduce a path attention mechanism for ego-BEV interaction and add two targeted auxiliary tasks: BEV unidirectional distillation and autoregressive online mapping. Extensive evaluations on the nuScenes dataset demonstrate that AdaptiveAD achieves state-of-the-art open-loop planning performance. Crucially, it significantly mitigates the over-reliance on ego status and exhibits impressive generalization capabilities across diverse scenarios. We will release the source code upon paper acceptance.

Decoupling Scene Perception and Ego Status: A Multi-Context Fusion Approach for Enhanced Generalization in End-to-End Autonomous Driving

We consider the problem of modifying a description logic concept in light of models represented as pointed interpretations. We call this setting model change, and distinguish three main kinds of changes: eviction, which consists of only removing models; reception, which incorporates models; and revision, which combines removal with incorporation of models in a single operation. We introduce a formal notion of revision and argue that it does not reduce to a simple combination of eviction and reception, contrary to intuition. We provide positive and negative results on the compatibility of eviction and reception for EL-bottom and ALC description logic concepts and on
the compatibility of revision for ALC concepts.

Model Change for Description Logic Concepts

The design of Large Language Models (LLMs) has long been hampered by a fundamental conflict within their core attention mechanism: its remarkable expressivity is built upon a computational complexity of $O(H \cdot N^2)$ that grows quadratically with the context size ($N$) and linearly with the number of heads ($H$). This standard implementation harbors significant computational redundancy, as all heads independently compute attention over the same sequence space. Existing sparse methods, meanwhile, often trade information integrity for computational efficiency. To resolve this efficiency-performance trade-off, we propose SPAttention, whose core contribution is the introduction of a new paradigm we term Principled Structural Sparsity. SPAttention does not merely drop connections but instead reorganizes the computational task by partitioning the total attention workload into balanced, non-overlapping distance bands, assigning each head a unique segment. This approach transforms the multi-head attention mechanism from $H$ independent $O(N^2)$ computations into a single, collaborative $O(N^2)$ computation, fundamentally reducing complexity by a factor of $H$. The structured inductive bias compels functional specialization among heads, enabling a more efficient allocation of computational resources from redundant modeling to distinct dependencies across the entire sequence span. Extensive empirical validation on the OLMoE-1B-7B and 0.25B-1.75B model series demonstrates that while delivering an approximately two-fold increase in training throughput, its performance is on par with standard dense attention, even surpassing it on select key metrics, while consistently outperforming representative sparse attention methods including Longformer, Reformer, and BigBird across all evaluation metrics. Our work demonstrates that thoughtfully designed structural sparsity can serve as an effective inductive bias that simultaneously improves both computational efficiency and model performance, opening a new avenue for the architectural design of next-generation, high-performance LLMs.

Downloads

Next from AAAI 2026

Region-Point Joint Representation for Effective Trajectory Similarity Learning

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES