Singapore

Learnable sparse retrieval (LSR) models encode texts into high-dimensional sparse representations, supporting token-level expansion beyond the original text and addressing the vocabulary mismatch problem in traditional bag-of-words retrieval.
However, in the absence of representation-level supervision, these representations usually overemphasize irrelevant tokens while neglecting truly relevant ones.
We term this phenomenon the Representation Hallucination problem in LSR models, a critical bottleneck impeding accurate retrieval.
To address this challenge, we introduce SiRe, a self-improving training framework for sparse retrieval that integrates two core strategies: Heuristic Representation Refinement and Representation-Focused Learning.
Specifically, SiRe first identifies and corrects representation hallucinations in the outputs of the current LSR model using heuristic methods.
The resulting representations serve as the primary supervision signals, guiding a pretrained language model (e.g., BERT) to mitigate the problem directly at the representation level.
This process can be iterated, enabling progressive model improvement.
Extensive experiments on both in-domain and out-domain benchmarks show that SiRe produces higher-quality sparse representations, significantly enhancing retrieval performance over strong baselines.

AAAI 2026

Self-Improving Sparse Retrieval Through Heuristic Representation Refinement and Representation-Focused Learning

learnable sparse retrieval

intelligent query processing

sparse representations

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

While feature-based knowledge distillation has proven highly effective for compressing CNNs, these techniques unexpectedly fail when applied to Vision Transformers (ViTs), often performing worse than simple logit-based distillation. We provide the first comprehensive analysis of this phenomenon through a novel analytical framework termed as "distillation dynamics", combining frequency spectrum analysis, information entropy metrics, and activation magnitude tracking. Our investigation reveals that ViTs exhibit a distinctive U-shaped information processing pattern: initial compression followed by expansion. We identify the root cause of negative transfer in feature distillation: a fundamental representational paradigm mismatch between teacher and student models. Through frequency-domain analysis, we show that teacher models employ distributed, high-dimensional encoding strategies in later layers that smaller student models cannot replicate due to limited channel capacity. This mismatch causes late-layer feature alignment to actively harm student performance. Our findings reveal that successful knowledge transfer in ViTs requires moving beyond naive feature mimicry to methods that respect these fundamental representational constraints, providing essential theoretical guidance for designing effective ViTs compression strategies. All source code and experimental logs are provided in the supplementary material.

Distillation Dynamics: Towards Understanding Feature-Based Distillation in Vision Transformers

Contextual Reinforcement Learning (CRL) tackles the problem of solving a set of related Contextual Markov Decision Processes (CMDPs) that vary across different context variables. Traditional approaches---independent training and multi-task learning---struggle with either excessive computational costs or negative transfer. A recently proposed multi-policy approach, Model-Based Transfer Learning (MBTL), has demonstrated effectiveness by strategically selecting a few tasks to train and zero-shot transfer. However, CMDPs encompass a wide range of problems, exhibiting structural properties that vary from problem to problem. As such, different task selection strategies are suitable for different CMDPs. In this work, we introduce Structure Detection MBTL (SD-MBTL), a generic framework that dynamically identifies the underlying generalization structure of CMDP and selects an appropriate MBTL algorithm. For instance, we observe \textsc{Mountain} structure in which generalization performance degrades from the training performance of the target task as the context difference increases. We thus propose M/GP-MBTL, which detects the structure and adaptively switches between a Gaussian Process-based approach and a clustering-based approach. Extensive experiments on synthetic data and CRL benchmarks—covering continuous control, traffic control, and agricultural management—show that M/GP-MBTL surpasses the strongest prior method by 12.49\% on the aggregated metric. These results highlight the promise of online structure detection for guiding source task selection in complex CRL environments. Our code is available at \url{https://anonymous.4open.science/r/SD-MBTL}.

Structure Detection for Contextual Reinforcement Learning

Score Distillation Sampling has driven recent advances in text-to-3D generation. However, current approaches often fail to produce 3D assets that are both rich in detail and consistent across viewpoints. These limitations primarily arise from imbalanced guidance on fine-grained details and an overdependence on single-view optimization—issues exacerbated by the excessive randomness in selecting diffusion timesteps and camera configurations. Such deficiencies commonly lead to blurry textures and inter-view inconsistencies, which degrade visual realism and hinder practical deployment.

To tackle these challenges, we introduce CoGrad3D, a unified generative refinement framework that adopts a continuously adaptive optimization strategy. By dynamically modulating the optimization focus based on real-time convergence signals, CoGrad3D ensures balanced progress toward both geometric completeness and high-fidelity detail. Concretely, we propose an adaptive region sampling strategy that emphasizes under-converged viewing areas, promoting stable and uniform optimization. To facilitate the transition from coarse geometry to fine-grained reconstruction, we develop a region-aware temporal scheduling scheme that integrates global training dynamics with local convergence feedback. Furthermore, we introduce a gradient fusion mechanism that consolidates historical gradients from adjacent viewpoints, mitigating view-specific artifacts and promoting the emergence of coherent 3D structures. Extensive experiments demonstrate that CoGrad3D substantially surpasses existing methods in both geometric consistency and texture fidelity, enabling the generation of high-quality, view-consistent 3D models from textual descriptions.

CoGrad3D: Spatially-Coupled Timestep Optimization with Orthogonal Gradient Fusion for 3D Generation

Accurate whole-heart segmentation is a critical component in the precise diagnosis and interventional planning of cardiovascular diseases. Integrating complementary information from modalities such as computed tomography (CT) and magnetic resonance imaging (MRI) can significantly enhance segmentation accuracy and robustness. However, existing multi-modal segmentation methods face several limitations: severe spatial inconsistency between modalities hinders effective feature fusion; fusion strategies are often static and lack adaptability; and the processes of feature alignment and segmentation are decoupled and inefficient. To address these challenges, we propose a dual-branch U-Net architecture enhanced by reinforcement learning for feature alignment, termed RL-U$^2$Net, designed for precise and efficient multi-modal 3D whole-heart segmentation. The model employs a dual-branch U-shaped network to process CT and MRI patches in parallel, and introduces a novel RL-XAlign module between the encoders. The module employs a cross‑modal attention mechanism to capture semantic correspondences between modalities and a reinforcement‑learning agent learns an optimal rotation strategy that consistently aligns anatomical pose and texture features. The aligned features are then reconstructed through their respective decoders. Finally, an ensemble‑learning–based decision module integrates the predictions from individual patches to produce the final segmentation result. Experimental results on the publicly available MM-WHS 2017 dataset demonstrate that the proposed RL-U$^2$Net outperforms existing state-of-the-art methods, achieving Dice coefficients of 93.1\% on CT and 87.0\% on MRI, thereby validating the effectiveness and superiority of the proposed approach.

RL-U2Net: A Dual-Branch UNet with Reinforcement Learning-Assisted Multimodal Feature Fusion for Accurate 3D Whole-Heart Segmentation

Personalized text-to-image diffusion models have gained increasing attention because they can generate images that contain unique concepts based on limited training data. However, in continual learning scenarios, these models suffer from concept bleed-through, where newly introduced concepts frequently overwrite or interfere with the previously learned concepts. Previous studies have attempted to mitigate this issue at the model adaptation level; however, they failed to fully preserve the distinct semantic representations in the latent space. Thus, this paper proposes an adversarial perturbation-based training strategy to address concept bleed-through in continual learning for personalized diffusion models. The proposed method introduces adversarial perturbations into the training images, which strategically shifts their semantic representations in the latent space to ensure that the newly learned concepts remain distinct and do not interfere with the previously acquired knowledge. Unlike structural modifications to the model, the proposed method operates at the data level, which makes it broadly applicable to existing continual personalization frameworks without increasing model complexity. Experimental results demonstrate that the proposed method significantly improves concept separation while maintaining high image fidelity, offering a solution to enhance the reliability of continual learning in personalized generative models.

Adversarial Perturbation Shield: Preventing Concept Bleed-through in Continual Learning of Personalized Generative Models

Truth-tracking in collective reasoning systems is a core challenge in domains such as e-democracy, online deliberation, and citizen opinion polling. 
Recently, a novel framework known as opinion-based argumentation has been proposed, aiming to model both voting and argumentation, together with collective opinion semantics designed to select sets of arguments that are mutually coherent and aligned with the agents’ votes.
In this paper, we address the problem of truth-tracking in opinion-based argumentation by formally defining the problem and presenting a systematic empirical analysis of collective opinion semantics.
This analysis demonstrates substantial variation in their truth-tracking performance across deliberative conditions, by introducing VAST, a comprehensive evaluation framework designed to systematically assess the epistemic adequacy of collective opinion semantics under diverse deliberative conditions. 
VAST includes formally defined metrics, a structured methodology for generating synthetic argumentation settings with ground-truth extensions, and a large-scale benchmark covering multiple extension-based semantics, graph types, and vote reliability levels.
Our results show that leveraging argumentation, as opposed to direct vote aggregation, substantially improves epistemic outcomes, particularly in settings with low vote quality or quantity.

Truth-Tracking Evaluation in Opinion-Based Argumentation

We present MCGS (Markov Chain Gaussian Splatting), a novel approach for high-fidelity dynamic scene reconstruction via combining Markov chain and 3D Gaussian splatting. Our method addresses the critical challenge of artifact-free temporal consistency in dynamic neural rendering. By integrating a Markov chain-based deformation network with multi-head temporal attention, MCGS effectively captures motion patterns and temporal dependencies, producing more accurate and stable 3D representations over time. The key innovations include: (1) a Markov Deform Network that models state transitions while preserving temporal coherence, (2) a temporal attention mechanism that adaptively weights historical states within a sliding window, and (3) strategic noise injection during training to enhance model robustness and generalization. Experiments on representative dynamic scene datasets demonstrate that MCGS outperforms previous methods in both visual quality and temporal coherence, while maintaining competitive rendering speed and efficiency. These results suggest the practical applicability of our approach to real-world dynamic scene understanding and synthesis.

MCGS: Markov Chain Gaussian Splatting for Dynamic Scenes Reconstruction

Ultra-high-resolution (UHR) text-to-image synthesis faces significant hurdles, including immense computational costs and a scarcity of training data. To address these, we introduce RealUHR, an efficient and scalable framework for generating photorealistic 4K images. At its core, RealUHR employs a Patch-Cascade Flow Matching pipeline that ensures global coherence without costly patch fusion by initiating generation from a semantically meaningful structure. This enables highly efficient, few-step inference for independent patches. Our key contribution is Guidance-Consistent Adaptation (GCA), a novel two-stage strategy to resolve the fundamental objective mismatch in guidance-distilled models. GCA allows powerful backbones like FLUX to be effectively adapted for patch-aware UHR synthesis. The framework's detail-rendering capabilities are further enhanced by a non-uniform time schedule. Experiments show that RealUHR establishes superior performance in both quality and efficiency, and excels in zero-shot applications such as creative up-sampling and generative artifact suppression.

RealUHR: Harnessing Patch-Cascade Flows for Photorealistic Ultra-High-Resolution Synthesis

3D Gaussian Splatting has emerged as a transformative technique in novel view synthesis, primarily due to its high rendering speed and photorealistic fidelity. However, its memory footprint scales rapidly with scene complexity, often reaching several gigabytes. Existing methods address this issue by introducing compression strategies that exploit primitive-level redundancy through similarity detection and quantization. We aim to surpass the compression limits of such methods by incorporating symmetry-aware techniques, specifically targeting mirror symmetries to eliminate redundant primitives. We propose a novel compression framework, $\textbf{\textit{SymGS}}$, introducing learnable mirrors into the scene, thereby eliminating local and global reflective redundancies for compression. Our framework functions as a plug-and-play enhancement to state-of-the-art compression methods, (e.g. HAC) to achieve further compression. Compared to HAC, we achieve $1.66 \times$ compression across benchmark datasets (upto $3\times$ on large-scale scenes). On an average, SymGS enables $\bf{108\times}$ compression of a 3DGS scene, while preserving rendering quality.

SymGS: Leveraging Reflective Symmetries for 3DGS Compression

The Internet of Things generates massive data streams, with edge computing emerging as a key enabler for online IoT applications and 5G networks. Edge solutions facilitate real-time machine learning inference, but also require continuous adaptation to concept drifts. While extensions of the Very Fast Decision Tree (VFDT) remain state-of-the-art for tabular stream mining, their unregulated growth limit efficiency, particularly in ensemble settings where post-pruning at the individual tree level is seldom applied. This paper presents DFDT, a novel memory-constrained algorithm for online learning. DFDT employs activity-aware pre-pruning, dynamically adjusting splitting criteria based on leaf node activity: low-activity nodes are deactivated to conserve resources, moderately active nodes split under stricter conditions, and highly active nodes leverage a skipping mechanism for accelerated growth. Additionally, adaptive grace periods and tie thresholds allow DFDT to modulate splitting decisions based on observed data variability, enhancing the accuracy–memory–runtime trade-off while minimizing the need for hyperparameter tuning. An ablation study reveals three DFDT variants suited to different resource profiles. Fully compatible with existing ensemble frameworks, DFDT provides a drop-in alternative to standard VFDT-based learners.

Downloads

Next from AAAI 2026

Distillation Dynamics: Towards Understanding Feature-Based Distillation in Vision Transformers

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Distillation Dynamics: Towards Understanding Feature-Based Distillation in Vision Transformers

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads