Singapore

While recent methods automate concept generation using Large Language Models (LLMs) and Vision-Language Models (VLMs), they still face three fundamental challenges: poor visual grounding, concept redundancy, and the absence of principled metrics to balance predictive accuracy and concept compactness. We introduce \textbf{PS-CBM}, a \textbf{P}artially \textbf{S}hared \textbf{CBM} framework that addresses these limitations through three core components: (1) a multimodal concept generator that integrates LLM-derived semantics with exemplar-based visual cues; (2) a Partially Shared Concept Strategy that merges concepts based on activation patterns to balance specificity and compactness; and (3) Concept-Efficient Accuracy (CEA), a post-hoc metric that jointly captures both predictive accuracy and concept compactness. Extensive experiments on eleven diverse datasets show that PS-CBM consistently outperforms state-of-the-art CBMs, improving classification accuracy by 1.0\%--7.4\% and CEA by 2.0\%--9.5\%, while requiring significantly fewer concepts. These results underscore PS-CBM&#39;s effectiveness in achieving both high accuracy and strong interpretability.

AAAI 2026

Partially Shared Concept Bottleneck Models

concept bottleneck models

vision-language models

large language models

interpretability

While recent methods automate concept generation using Large Language Models (LLMs) and Vision-Language Models (VLMs), they still face three fundamental challenges: poor visual grounding, concept redundancy, and the absence of principled metrics to balance predictive accuracy and concept compactness. We introduce \textbf{PS-CBM}, a \textbf{P}artially \textbf{S}hared \textbf{CBM} framework that addresses these limitations through three core components: (1) a multimodal concept generator that integrates LLM-derived semantics with exemplar-based visual cues; (2) a Partially Shared Concept Strategy that merges concepts based on activation patterns to balance specificity and compactness; and (3) Concept-Efficient Accuracy (CEA), a post-hoc metric that jointly captures both predictive accuracy and concept compactness. Extensive experiments on eleven diverse datasets show that PS-CBM consistently outperforms state-of-the-art CBMs, improving classification accuracy by 1.0\%--7.4\% and CEA by 2.0\%--9.5\%, while requiring significantly fewer concepts. These results underscore PS-CBM's effectiveness in achieving both high accuracy and strong interpretability.

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Gradient Boosting Decision Trees (GBDTs) are widely used in industry and academia for their high accuracy and efficiency, particularly on structured data. However, the subject of watermarking GBDT models remains underexplored, especially compared to neural networks. In this work, we present the first robust watermarking framework tailored to GBDT models, utilizing in-place fine-tuning to embed imperceptible and resilient watermarks. We propose four embedding strategies, each designed to minimize impact on model accuracy while ensuring watermark robustness. Through experiments across diverse datasets, we demonstrate that our methods achieve high watermark embedding rates, low accuracy degradation, and strong resistance to post-deployment fine-tuning.

Robust Watermarking on Gradient Boosting Decision Trees

We present NoReGeo, a novel benchmark designed to evaluate the intrinsic geometric understanding of large language models (LLMs) without relying on reasoning or algebraic computation. Unlike existing benchmarks that primarily assess models' proficiency in reasoning-based geometry-where solutions are derived using algebraic methods-NoReGeo focuses on evaluating whether LLMs can inherently encode spatial relationships and recognize geometric properties directly. Our benchmark comprises 2,500 trivial geometric problems spanning 25 categories, each carefully crafted to be solvable purely through native geometric understanding, assuming known object locations. We assess a range of state-of-the-art models on NoReGeo, including frontier models like GPT-4, observing that even the most advanced systems achieve a maximum of 65\% accuracy in binary classification tasks. Further, our ablation experiments demonstrate that such geometric understanding does not emerge through fine-tuning alone, indicating that effective training for geometric comprehension requires a specialized approach from the outset. Our findings highlight a significant gap in current LLMs' ability to natively grasp geometric concepts, providing a foundation for future research toward models with true geometric cognition.

NoReGeo: Non-Reasoning Geometry Benchmark

Deep neural networks have demonstrated remarkable performance across various domains, yet their decision-making processes remain opaque. Although many explanation methods are dedicated to bringing the obscurity of DNNs to light, they exhibit significant limitations: post-hoc explanation methods often struggle to faithfully reflect model behaviors, while self-explaining neural networks sacrifice performance and compatibility due to their specialized architectural designs. To address these challenges, we propose a novel self-explaining framework that integrates Shapley value estimation as an auxiliary task during training, which achieves two key advancements: 1) a fair allocation of the model prediction scores to image patches, ensuring explanations inherently align with the model's decision logic, and 2) enhanced interpretability with minor structural modifications, preserving model performance and compatibility. Extensive experiments on multiple benchmarks demonstrate that our method achieves state-of-the-art interpretability.

Enhancing Interpretability for Vision Models via Shapley Value Optimization

We study a nonlinear dynamics of binary opinions in a population of agents connected by direct a network, influenced by two competing forces. On the one hand, agents are stubborn, i.e., have a tendency for one of the two opinions; on the other hand, there is a disruptive bias, $p\in[0,1]$, that drives the agents toward the other opinion. The disruptive bias models external factors, such as market innovations or social controllers, aiming to challenge the status quo, while agents' stubbornness reinforces the initial opinion making it harder for the external bias to drive the process toward change. Each agent updates its opinion according to a nonlinear function of the states of its neighbors and of the bias $p$. We consider the case of random directed graphs with prescribed in- and out-degree sequences and we prove that the dynamics exhibits a phase transition: when the disruptive bias $p$ is larger than a critical threshold $p_c$, the population converges in constant time to a consensus on the disruptive opinion. Conversely, when the bias $p$ is less than $p_c$, the system enters a metastable state in which only a fraction of agents $q_\star(p)<1$ will share the new opinion for a long time. We characterize $p_c$ and $q_\star(p)$ explicitly, showing that they only depend on few simple statistics of the degree sequences. Our analysis relies on a dual system of branching, coalescing, and dying particles, which we show exhibits equivalent behavior and allows a rigorous characterization of the system's dynamics. Our results characterize the interplay between the degree of the agents, their stubbornness, and the external bias, shedding light on the tipping points of opinion dynamics in networks.

A Phase Transition for Opinion Dynamics with Competing Biases

Single-image-to-3D models typically follow a sequential generation and reconstruction workflow. However, intermediate multi-view images synthesized by pre-trained generation models often lack cross-view consistency (CVC), significantly degrading 3D reconstruction performance. While recent methods attempt to refine CVC by feeding reconstruction results back into the multi-view generator, these approaches struggle with noisy and unstable reconstruction outputs that limit effective CVC improvement.
We introduce AlignCVC, a novel framework that fundamentally re-frames single-image-to-3D generation through distribution alignment rather than relying on strict regression losses. Our key insight is to align both generated and reconstructed multi-view distributions toward the ground-truth multi-view distribution, establishing a principled foundation for improved CVC. Observing that generated images exhibit weak CVC while reconstructed images display strong CVC due to explicit rendering, we propose a soft-hard alignment strategy with distinct objectives for generation and reconstruction models. This approach not only enhances generation quality but also dramatically accelerates inference to as few as 4 steps.
As a plug-and-play paradigm, our method, namely AlignCVC, seamlessly integrates various combinations of multiview generation models with 3D reconstruction models. Extensive experiments demonstrate the effectiveness and efficiency of AlignCVC for single-image-to-3D generation. Codes and models will be made publicly available.

AlignCVC: Aligning Cross-View Consistency for Single-Image-to-3D Generation

Knowledge Graph (KG)-supported Graph Neural Network (GNN) models are becoming increasingly crucial in recommendation systems due to their ability to mitigate the data sparsity challenge. However, these models remain suboptimal because they overlook the representation differences between the inherent user-item Bipartite Graph (BG) and the external head-relation-tail KG, leading to semantic misalignment. Moreover, they indiscriminately incorporate various types of relations from the KG, which may introduce noise information into the model, ultimately degrading recommendation performance. To address these challenges, we propose an end-to-end model named Multi-graph Fusion Cross-model Contrastive Learning (MFCCL). To uncover users' interest in items and explore the associations between items, We first construct a user-interest graph by integrating information from both the BG and KG, and an item-association graph derived from the KG. Furthermore, we devise a multi-graph representation learning module that incorporates rich semantics into user and item representations in parallel. Simultaneously, a classical collaborative filtering module is introduced to fully leverage user-item collaborative signals. In addition, we design a novel free data-augmentation cross-model contrastive learning to facilitate the exchange of complementary information between different models. Empirical evaluations on three widely-used benchmarks demonstrate that our MFCCL method achieved significant improvements over the baselines. Further analyses confirmed the effectiveness and advantages of the proposed multi-graph fusion representation and cross-model contrastive learning.

Multi-graph Fusion Cross-model Contrastive Learning for Recommendation

Neural signed distance functions (SDFs) have been a vital representation to represent 3D shapes or scenes with neural networks. An SDF is an implicit function that can query signed distances at specific coordinates for recovering a 3D surface. Although implicit functions work well on a single shape or scene, they pose obstacles when analyzing multiple SDFs with high-fidelity geometry details, due to the non-compact representations of SDFs and the loss of geometry details. To overcome these obstacles, we introduce a method to represent multiple SDFs in a common space, aiming to recover more high-fidelity geometry details with more compact latent representations. Our key idea is to take full advantage of the benefits of generalization-based and overfitting-based learning strategies, which manage to preserve high-fidelity geometry details with compact latent codes. Based on this framework, we also introduce a novel sampling strategy to sample training queries. The sampling can improve the training efficiency and eliminate artifacts caused by the influence of other SDFs. We report numerical and visual evaluations on widely used benchmarks to validate our designs and show advantages over the latest methods in terms of the representative ability and compactness.

Learning Compact Latent Space for Representing Neural Signed Distance Functions with High-fidelity Geometry Details

Many optimization tasks involve streaming data with unknown concept drifts, posing a significant challenge as Streaming Data-Driven Optimization (SDDO). Existing methods, while leveraging surrogate model approximation and historical knowledge transfer, are often under restrictive assumptions such as fixed drift intervals and fully environmental observability, thus limiting their adaptability to diverse dynamic environments. We propose TRACE, a \underline{TRA}nsferable \underline{C}oncept-drift \underline{E}stimator that effectively detects distributional changes in streaming data with varying time scales. TRACE leverages a principled tokenization strategy to extract statistical features from data streams and models drift patterns using attention-based sequence learning, enabling accurate detection on unseen datasets and highlighting the transferability of learned drift patterns. Further, we showcase TRACE's plug-and-play nature by integrating it into a streaming optimizer, facilitating adaptive optimization under unknown concept drifts. Comprehensive experimental results on diverse benchmarks demonstrate the superior generalization, robustness, and effectiveness of our approach in SDDO scenarios. We provide TRACE's code at https://github.com/YTALIEN/TRACE.

TRACE: A Generalizable Drift Detector for Streaming Data-Driven Optimization

In image enhancement tasks, such as low-light and underwater image enhancement, a degraded image can correspond to multiple plausible target images due to dynamic photography conditions, such as variations in illumination. This naturally results in a one-to-many mapping challenge.
To address this, we propose a Bayesian Enhancement Model (BEM) that incorporates Bayesian Neural Networks (BNNs) to capture data uncertainty and produce diverse outputs. To enable fast inference, we introduce a BNN-DNN framework: a BNN is first employed to model the one-to-many mapping in a low-dimensional space, followed by a Deterministic Neural Network (DNN) that refines fine-grained image details.
Extensive experiments on multiple low-light and underwater image enhancement benchmarks demonstrate the effectiveness of our method.

Bayesian Neural Networks for One-to-Many Mapping in Image Enhancement

Point cloud quality assessment (PCQA) has advanced significantly with synthetic datasets offering diverse distortion coverage for model training. However, when applied to new application scenarios, models often suffer from performance drops due to mismatched distortion characteristics between source and target domains. Most current methods use all available synthetic distortions, which may introduce irrelevant features and hinder generalization. To address this, we propose DST-PCQA, a distortion-selective training framework for PCQA. Unlike previous approaches that treat all distortions equally, DST-PCQA identifies and selects distortion types most relevant to a target domain by analyzing inter-domain distortion similarity. This selective strategy reduces negative transfer and enables efficient domain-specific training. To fully leverage the selected distortions for both classification and quality prediction, we adopt a dual-branch architecture that fuses 2D visual cues and 3D geometric structure via cross-modal attention. This design supports multi-level feature alignment across modalities and enables fine-grained distortion understanding. Extensive evaluations across three target domains have verified the effectiveness of DST-PCQA over full-set training baselines. Moreover, its distortion-selective strategy is orthogonal to existing model-based PCQA methods, enabling improved cross-domain performance and reduced training costs across a wide range of architectures.

Downloads

Next from AAAI 2026

Robust Watermarking on Gradient Boosting Decision Trees

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Robust Watermarking on Gradient Boosting Decision Trees

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads