Singapore

The success of deep learning is highly dependent on
large-scale labeled data. This presents a formidable
challenge in fields like molecular design and materials
science, where data annotation is prohibitively expensive.
Consequently, developing label-efficient learning methods
to maximize model performance under limited annotation
budgets has recently become more and more critical.

However, most of the current mainstream label-efficient
algorithms, like active learning and semi-supervised
learning, are primarily designed for Euclidean data, such
as images. They cannot effectively process the
non-Euclidean graph-structured data, thus overlooking the
rich topological information embedded within.

In this talk, we aim to bridge this gap through a
progressive research path that addresses three core
challenges in data annotation for graph-structured data.
First, to address the high cost of annotation, we adapt
active learning and semi-supervised learning from general
domains to explicit graph data, enabling the precise
labeling of high-value nodes. Second, to address label
scarcity, we pioneer methods to construct and leverage
implicit graph structures, propagating existing labels and
generating new information to boost the performance of
semi-supervised and self-supervised learning. Finally, to
address label noise, we perform the fusion of both explicit
and implicit graphs. By learning an implicit structure from
noisy explicit graph data, our methods will identify and
mitigate the impact of noise.

AAAI 2026

Graph-based Label-Efficient Learning: When Graph-Structured Data Meets Limited Labels

graph mining

unsupervised & self-supervised learning

graph-based machine learning

semi-supervised learning

active learning

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The internet has become the main source of data to train modern text-to-image or vision-language models, yet it is increasingly unclear whether web-scale data collection practices for training AI systems adequately respect data owners' wishes. Ignoring the owner's indication of consent around data usage not only raises ethical concerns but also has recently been elevated into lawsuits around copyright infringement cases. In this work, we aim to reveal information about data owners' consent to AI scraping and training, and study how it's expressed in DataComp, a popular dataset of 12.8 billion text-image pairs. We examine both the \textit{sample-level} information, including the copyright notice, watermarking, and metadata, and the \textit{web-domain-level} information, such as a site's Terms of Service (ToS) and Robots Exclusion Protocol. We estimate at least 122M of samples exhibit some indication of copyright notice in CommonPool, and find that 60\% of the samples in the top 50 domains come from websites with ToS that prohibit scraping. Furthermore, we estimate 9-13\% with 95\% confidence interval of samples from CommonPool to contain watermarks, where existing watermark detection methods fail to capture them in high fidelity. Our holistic methods and findings show that data owners rely on various channels to convey data consent, of which current AI data collection pipelines do not entirely respect. These findings highlight the limitations of the current dataset curation/release practice and the need for a unified data consent framework taking AI purposes into consideration.

How Do Data Owners Say No? A Case Study of Data Consent Mechanisms in Web-Scraped Vision-Language AI Training Datasets

Feed-forward 3D reconstruction from sparse, low-resolution (LR) images is a crucial capability for real-world applications, such as autonomous driving and embodied AI. However, existing methods often fail to recover fine texture details. This limitation stems from the inherent lack of high-frequency information in LR inputs. To address this, we propose SRSplat, a feed-forward framework that reconstructs high-resolution 3D scenes from only a few LR views. Our main insight is to compensate for the deficiency of texture information by jointly leveraging external high-quality reference images and internal texture cues. We first construct a scene-specific reference gallery, generated for each scene using Multimodal Large Language Models (MLLMs) and diffusion models. To integrate this external information, we introduce the Reference-Guided Feature Enhancement (RGFE) module, which aligns and fuses features from the LR input images and their reference twin image. Subsequently, we train a decoder to predict the Gaussian primitives using the multi-view fused feature obtained from RGFE. To further refine predicted Gaussian primitives, we introduce Texture-Aware Density Control (TADC), which adaptively adjusts Gaussian density based on the internal texture richness of the LR inputs. Extensive experiments demonstrate that our SRSplat outperforms existing methods on various datasets, including RealEstate10K, ACID, and DTU, and exhibits strong cross-dataset and cross-resolution generalization capabilities. Our code and video demos can be found in the supplementary materials.

SRSplat: Feed-Forward Super-Resolution Gaussian Splatting from Sparse Multi-View Images

Image Aesthetics Assessment (IAA) evaluates visual quality through user-centered perceptual analysis and can guide various applications. Recent advances in Multimodal Large Language Models (MLLMs) have sparked interest in adapting them for IAA. However, two critical limitations persist in applying MLLMs to IAA: 1) the tokenization strategy leads to insensitivity to scores, and 2) the classification-based decoding mechanisms introduce score quantization errors. Current MLLM-based IAA methods treat the task as coarse rating classification followed by probability-to-score mapping, which loses fine-grained information. To address these challenges, we propose ROC4MLLM, offering complementary solutions from two perspectives:1) Representation: We separate scores from the word token space to avoid tokenizing scores as text. An independent position token bridges these spaces, improving the sensitivity of the model to score positions in text. 2) Computation: We apply distinct loss functions for text and score predictions to enhance the sensitivity of the model to score gradients. Decoupling scores from text ensures effective supervision while preventing interference between scores and text in the loss computation. Extensive experiments across five datasets demonstrate that ROC4MLLM achieves state-of-the-art performance without requiring additional training data. Additionally, its plug-and-play design ensures seamless integration with existing MLLMs, boosting their IAA performance. All resources are available in here.

Regression over Classification: Assessing Image Aesthetics via Multimodal Large Language Models

Large Language Models (LLMs) are increasingly integral to recommendation systems, offering sophisticated language understanding and generation capabilities. However, their practical application is often hindered by challenges such as data sparsity, the generation of unreliable or hallucinated recommendations, and a general lack of transparency in their decision-making processes. Existing mitigation strategies frequently introduce significant complexity or computational overhead. To address these limitations, particularly the critical gap in quantifying the confidence of LLM-generated recommendations, we propose **GUIDER**: Uncertainty Guided Dynamic Re-ranking for Large Language Models Based Recommender Systems. This new framework innovatively leverages the logits produced by LLMs as evidence for recommended items. By employing a Dirichlet distribution, GUIDER decomposes the total predictive uncertainty into distinct Data Uncertainty (DU), reflecting inherent data ambiguity, and Model Uncertainty (MU), indicating the model's own conviction. This principled decomposition, achieved with a single inference pass, enhances transparency and trustworthiness. Based on the quantified DU and MU levels, our system dynamically adapts its recommendation strategy---adjusting output diversity, explanation depth, or invoking fallback mechanisms---through a four-quadrant analysis that tailors responses to specific uncertainty profiles. Extensive experiments conducted in zero-shot recommendation settings validate the effectiveness of our approach. GUIDER consistently outperforms existing methods in reliability-aware scenarios, demonstrably improving recommendation quality. This framework not only advances the practical deployment of LLM-based recommenders by making them more dependable but also provides a robust foundation for future research into uncertainty-aware generative systems.

GUIDER: Uncertainty Guided Dynamic Re-ranking for Large Language Models Based Recommender Systems

Online continual learning requires models to learn from non‑stationary data streams while retaining prior knowledge. We identify an overlooked phenomenon—knowledge fragility—where correctly learned instances are rapidly forgotten after minor parameter updates. Our analysis attributes this fragility to a temporal–spatial dual mechanism: temporal instability, high-frequency parameter oscillations cause forgetting to outpace adaptation; and spatial vulnerability, fragile instances lie in sharp, high‑curvature regions of the loss landscape that are extremely sensitive to optimization noise. These insights motivate PDFK (Perturbing to Defend Fragile Knowledge), a unified framework that defends fragile knowledge along both dimensions. Temporally, we apply exponential moving averaging to smooth parameter evolution and stabilize long‑term memory. Spatially, we inject minimal structured perturbations with a consistency constraint to flatten sharp regions and enhance robustness. PDFK requires no task‑boundary annotations. Extensive experiments demonstrate that PDFK substantially improves knowledge retention and outperforms strong baselines under diverse and challenging continual learning settings.

Perturbing to Preserve: Defending Fragile Knowledge in Online Continual Learning

While current state-of-the-art Remote Sensing Change Detection (RSCD) methods can achieve impressive results on individual datasets, they become unreliable in unseen environments and imaging conditions, with performance metrics declining by as much as 60% to 80%. Simultaneously, variable environments and complex imaging conditions are the main characteristics of remote sensing data, calling for generalizable RSCD methods. To address this issue, we propose a novel RSCD method capable of domain generalization—CDDGNet. This method is based on causal decoupling theory, which progressively decouples invariant change features from variable domain features to extract generalizable characteristics. This enables a network trained on a single domain to accurately identify change regions in other domains. Specifically, firstly, the Causal Feature Adaptation Module is proposed to preliminarily decouple and simplify feature information during the encoding process by using wavelet transformation and feature energy spectralization methods. Secondly, the Causal Feature Fusion Module is presented to fully decouple features and aggregate significant change features during the decoding process through frequency domain processing and feature re-attention mechanisms. Thirdly, the Decoupling Effect Loss Function is proposed to optimize the process by evaluating the effectiveness of causal decoupling. Extensive experiments have shown that our model significantly outperforms existing methods across multiple groups of generalization tasks with varying levels of difficulty.

Causal Decoupling Domain Generalization for Remote Sensing Change Detection

With the rapid advancement of large language models (LLMs), their deployment in real-world applications has become increasingly widespread. LLMs are expected to deliver robust performance across diverse tasks, user preferences, and practical scenarios. However, as demands grow, ensuring that LLMs produce responses aligned with human intent remains a foundational challenge. In particular, aligning model behavior effectively and efficiently during inference, without costly retraining or extensive supervision, is both a critical requirement and a non-trivial technical endeavor. To address the challenge, we propose SDA (Steering-Driven Distribution Alignment), a training-free and model-agnostic alignment framework designed for open-source LLMs. SDA dynamically redistributes model output probabilities based on user-defined alignment instructions, enhancing alignment between model behavior and human intents without fine-tuning. The method is lightweight, resource-efficient, and compatible with a wide range of open-source LLMs. It can function independently during inference or be integrated with training-based alignment strategies. Moreover, SDA supports personalized preference alignment, enabling flexible control over the model’s response behavior. Empirical results demonstrate that SDA consistently improves alignment performance across 8 open-source LLMs with varying scales and diverse origins, evaluated on three key alignment dimensions, helpfulness, harmlessness, and honesty (3H). Specifically, SDA achieves average gains of 64.4% in helpfulness, 30% in honesty and 11.5% in harmlessness across the tested models, indicating its effectiveness and generalization across diverse models and application scenarios.

SDA: Steering-Driven Distribution Alignment for Open LLMs Without Fine-Tuning

Large Vision-Language Models (LVLMs) excel in multimodal reasoning and have shown impressive performance on various multimodal benchmarks. However, most of these benchmarks evaluate models primarily through multiple-choice or short-answer formats, which do not take the reasoning process into account. Although some benchmarks assess the reasoning process, their methods are often overly simplistic and only examine reasoning when answers are incorrect. This approach overlooks scenarios where flawed reasoning leads to correct answers. In addition, these benchmarks do not consider the impact of intermodal relationships on reasoning. To address this issue, we propose the Reasoning Process Tree Score (RPTS), a tree structure-based metric to assess reasoning processes. Specifically, we organize the reasoning steps into a reasoning tree and leverage its hierarchical information to assign weighted faithfulness scores to each reasoning step. By dynamically adjusting these weights, RPTS not only evaluates the overall correctness of the reasoning, but also pinpoints where the model fails in the reasoning. To validate RPTS in real-world multimodal scenarios, we construct a new benchmark, RPTS-Eval, comprising 374 images and 390 reasoning instances. Each instance includes reliable visual-textual clues that serve as leaf nodes of the reasoning tree. Furthermore, we define three types of intermodal relationships to investigate how intermodal interactions influence the reasoning process. We evaluated representative LVLMs (e.g., GPT4o, Llava-Next), uncovering their limitations in multimodal reasoning and highlighting the differences between open-source and closed-source commercial LVLMs. We believe that this benchmark will contribute to the advancement of research in the field of multimodal reasoning.

RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation

While large language models (LLMs) have shown promise in healthcare, their application for rare medical conditions is still hindered by scarce and unreliable datasets for fine-tuning. Hyperhidrosis, a disorder causing excessive sweating beyond physiological needs, is one such rare disorder, affecting 2–3\% of the population and significantly impacting both physical comfort and psychosocial well-being. To date, no work has tailored LLMs to advance the diagnosis or care of hyperhidrosis. To address this gap, we present LLM4Sweat, an open-source and domain-specific LLM framework for trustworthy and empathetic hyperhidrosis support. The system follows a three-stage pipeline. In the data augmentation stage, a frontier LLM generates medically plausible synthetic vignettes from curated open-source data to create a diverse and balanced question–answer dataset. In the fine-tuning stage, an open-source foundation model is fine-tuned on the dataset to provide diagnosis, personalized treatment recommendations, and empathetic psychological support. In the inference and expert evaluation stage, clinical and psychological specialists assess accuracy, appropriateness, and empathy, with validated responses iteratively enriching the dataset. Experiments show that LLM4Sweat outperforms baselines and delivers the first open-source LLM framework for hyperhidrosis, offering a generalizable approach for other rare diseases with similar data and trustworthiness challenges.

LLM4Sweat: A Trustworthy Large Language Model for Hyperhidrosis Support

Underground pipeline leaks and infiltrations pose significant threats to water security and environmental safety. Traditional manual inspection methods provide limited coverage and delayed response, often missing critical anomalies. This paper proposes AquaSentinel, a novel physics-informed AI system for real-time anomaly detection in urban underground water pipeline networks. We introduce four key innovations: (1) strategic sparse sensor deployment at high-centrality nodes combined with physics-based state augmentation to achieve network-wide observability from minimal infrastructure; (2) the RTCA (Real-Time Cumulative Anomaly) detection algorithm, which employs dual-threshold monitoring with adaptive statistics to distinguish transient fluctuations from genuine anomalies; (3) a Mixture of Experts (MoE) ensemble of spatiotemporal graph neural networks that provides robust predictions by dynamically weighting model contributions; (4) causal flow-based leak localization that traces anomalies upstream to identify source nodes and affected pipe segments. Our system strategically deploys sensors at critical network junctions and leverages physics-based modeling to propagate measurements to unmonitored nodes, creating virtual sensors that enhance data availability across the entire network. Experimental evaluation using 110 leak scenarios demonstrates that AquaSentinel achieves 100% detection accuracy. This work advances pipeline monitoring by demonstrating that physics-informed sparse sensing can match the performance of dense deployments at a fraction of the cost, providing a practical solution for aging urban infrastructure.

AquaSentinel: Next-Generation AI System Integrating Sensor Networks for Urban Underground Water Pipeline Anomaly Detection via Collaborative MoE-LLM Agent Architecture

Downloads

Next from AAAI 2026

How Do Data Owners Say No? A Case Study of Data Consent Mechanisms in Web-Scraped Vision-Language AI Training Datasets

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

How Do Data Owners Say No? A Case Study of Data Consent Mechanisms in Web-Scraped Vision-Language AI Training Datasets

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads