Singapore

Contextual Reinforcement Learning (CRL) tackles the problem of solving a set of related Contextual Markov Decision Processes (CMDPs) that vary across different context variables. Traditional approaches---independent training and multi-task learning---struggle with either excessive computational costs or negative transfer. A recently proposed multi-policy approach, Model-Based Transfer Learning (MBTL), has demonstrated effectiveness by strategically selecting a few tasks to train and zero-shot transfer. However, CMDPs encompass a wide range of problems, exhibiting structural properties that vary from problem to problem. As such, different task selection strategies are suitable for different CMDPs. In this work, we introduce Structure Detection MBTL (SD-MBTL), a generic framework that dynamically identifies the underlying generalization structure of CMDP and selects an appropriate MBTL algorithm. For instance, we observe \textsc{Mountain} structure in which generalization performance degrades from the training performance of the target task as the context difference increases. We thus propose M/GP-MBTL, which detects the structure and adaptively switches between a Gaussian Process-based approach and a clustering-based approach. Extensive experiments on synthetic data and CRL benchmarks—covering continuous control, traffic control, and agricultural management—show that M/GP-MBTL surpasses the strongest prior method by 12.49\% on the aggregated metric. These results highlight the promise of online structure detection for guiding source task selection in complex CRL environments. Our code is available at \url{https://anonymous.4open.science/r/SD-MBTL}.

AAAI 2026

Structure Detection for Contextual Reinforcement Learning

contextual reinforcement learning

zero-shot transfer

deep reinforcement learning

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Score Distillation Sampling has driven recent advances in text-to-3D generation. However, current approaches often fail to produce 3D assets that are both rich in detail and consistent across viewpoints. These limitations primarily arise from imbalanced guidance on fine-grained details and an overdependence on single-view optimization—issues exacerbated by the excessive randomness in selecting diffusion timesteps and camera configurations. Such deficiencies commonly lead to blurry textures and inter-view inconsistencies, which degrade visual realism and hinder practical deployment.

To tackle these challenges, we introduce CoGrad3D, a unified generative refinement framework that adopts a continuously adaptive optimization strategy. By dynamically modulating the optimization focus based on real-time convergence signals, CoGrad3D ensures balanced progress toward both geometric completeness and high-fidelity detail. Concretely, we propose an adaptive region sampling strategy that emphasizes under-converged viewing areas, promoting stable and uniform optimization. To facilitate the transition from coarse geometry to fine-grained reconstruction, we develop a region-aware temporal scheduling scheme that integrates global training dynamics with local convergence feedback. Furthermore, we introduce a gradient fusion mechanism that consolidates historical gradients from adjacent viewpoints, mitigating view-specific artifacts and promoting the emergence of coherent 3D structures. Extensive experiments demonstrate that CoGrad3D substantially surpasses existing methods in both geometric consistency and texture fidelity, enabling the generation of high-quality, view-consistent 3D models from textual descriptions.

CoGrad3D: Spatially-Coupled Timestep Optimization with Orthogonal Gradient Fusion for 3D Generation

Accurate whole-heart segmentation is a critical component in the precise diagnosis and interventional planning of cardiovascular diseases. Integrating complementary information from modalities such as computed tomography (CT) and magnetic resonance imaging (MRI) can significantly enhance segmentation accuracy and robustness. However, existing multi-modal segmentation methods face several limitations: severe spatial inconsistency between modalities hinders effective feature fusion; fusion strategies are often static and lack adaptability; and the processes of feature alignment and segmentation are decoupled and inefficient. To address these challenges, we propose a dual-branch U-Net architecture enhanced by reinforcement learning for feature alignment, termed RL-U$^2$Net, designed for precise and efficient multi-modal 3D whole-heart segmentation. The model employs a dual-branch U-shaped network to process CT and MRI patches in parallel, and introduces a novel RL-XAlign module between the encoders. The module employs a cross‑modal attention mechanism to capture semantic correspondences between modalities and a reinforcement‑learning agent learns an optimal rotation strategy that consistently aligns anatomical pose and texture features. The aligned features are then reconstructed through their respective decoders. Finally, an ensemble‑learning–based decision module integrates the predictions from individual patches to produce the final segmentation result. Experimental results on the publicly available MM-WHS 2017 dataset demonstrate that the proposed RL-U$^2$Net outperforms existing state-of-the-art methods, achieving Dice coefficients of 93.1\% on CT and 87.0\% on MRI, thereby validating the effectiveness and superiority of the proposed approach.

RL-U2Net: A Dual-Branch UNet with Reinforcement Learning-Assisted Multimodal Feature Fusion for Accurate 3D Whole-Heart Segmentation

Personalized text-to-image diffusion models have gained increasing attention because they can generate images that contain unique concepts based on limited training data. However, in continual learning scenarios, these models suffer from concept bleed-through, where newly introduced concepts frequently overwrite or interfere with the previously learned concepts. Previous studies have attempted to mitigate this issue at the model adaptation level; however, they failed to fully preserve the distinct semantic representations in the latent space. Thus, this paper proposes an adversarial perturbation-based training strategy to address concept bleed-through in continual learning for personalized diffusion models. The proposed method introduces adversarial perturbations into the training images, which strategically shifts their semantic representations in the latent space to ensure that the newly learned concepts remain distinct and do not interfere with the previously acquired knowledge. Unlike structural modifications to the model, the proposed method operates at the data level, which makes it broadly applicable to existing continual personalization frameworks without increasing model complexity. Experimental results demonstrate that the proposed method significantly improves concept separation while maintaining high image fidelity, offering a solution to enhance the reliability of continual learning in personalized generative models.

Adversarial Perturbation Shield: Preventing Concept Bleed-through in Continual Learning of Personalized Generative Models

Truth-tracking in collective reasoning systems is a core challenge in domains such as e-democracy, online deliberation, and citizen opinion polling. 
Recently, a novel framework known as opinion-based argumentation has been proposed, aiming to model both voting and argumentation, together with collective opinion semantics designed to select sets of arguments that are mutually coherent and aligned with the agents’ votes.
In this paper, we address the problem of truth-tracking in opinion-based argumentation by formally defining the problem and presenting a systematic empirical analysis of collective opinion semantics.
This analysis demonstrates substantial variation in their truth-tracking performance across deliberative conditions, by introducing VAST, a comprehensive evaluation framework designed to systematically assess the epistemic adequacy of collective opinion semantics under diverse deliberative conditions. 
VAST includes formally defined metrics, a structured methodology for generating synthetic argumentation settings with ground-truth extensions, and a large-scale benchmark covering multiple extension-based semantics, graph types, and vote reliability levels.
Our results show that leveraging argumentation, as opposed to direct vote aggregation, substantially improves epistemic outcomes, particularly in settings with low vote quality or quantity.

Truth-Tracking Evaluation in Opinion-Based Argumentation

We present MCGS (Markov Chain Gaussian Splatting), a novel approach for high-fidelity dynamic scene reconstruction via combining Markov chain and 3D Gaussian splatting. Our method addresses the critical challenge of artifact-free temporal consistency in dynamic neural rendering. By integrating a Markov chain-based deformation network with multi-head temporal attention, MCGS effectively captures motion patterns and temporal dependencies, producing more accurate and stable 3D representations over time. The key innovations include: (1) a Markov Deform Network that models state transitions while preserving temporal coherence, (2) a temporal attention mechanism that adaptively weights historical states within a sliding window, and (3) strategic noise injection during training to enhance model robustness and generalization. Experiments on representative dynamic scene datasets demonstrate that MCGS outperforms previous methods in both visual quality and temporal coherence, while maintaining competitive rendering speed and efficiency. These results suggest the practical applicability of our approach to real-world dynamic scene understanding and synthesis.

MCGS: Markov Chain Gaussian Splatting for Dynamic Scenes Reconstruction

Ultra-high-resolution (UHR) text-to-image synthesis faces significant hurdles, including immense computational costs and a scarcity of training data. To address these, we introduce RealUHR, an efficient and scalable framework for generating photorealistic 4K images. At its core, RealUHR employs a Patch-Cascade Flow Matching pipeline that ensures global coherence without costly patch fusion by initiating generation from a semantically meaningful structure. This enables highly efficient, few-step inference for independent patches. Our key contribution is Guidance-Consistent Adaptation (GCA), a novel two-stage strategy to resolve the fundamental objective mismatch in guidance-distilled models. GCA allows powerful backbones like FLUX to be effectively adapted for patch-aware UHR synthesis. The framework's detail-rendering capabilities are further enhanced by a non-uniform time schedule. Experiments show that RealUHR establishes superior performance in both quality and efficiency, and excels in zero-shot applications such as creative up-sampling and generative artifact suppression.

RealUHR: Harnessing Patch-Cascade Flows for Photorealistic Ultra-High-Resolution Synthesis

3D Gaussian Splatting has emerged as a transformative technique in novel view synthesis, primarily due to its high rendering speed and photorealistic fidelity. However, its memory footprint scales rapidly with scene complexity, often reaching several gigabytes. Existing methods address this issue by introducing compression strategies that exploit primitive-level redundancy through similarity detection and quantization. We aim to surpass the compression limits of such methods by incorporating symmetry-aware techniques, specifically targeting mirror symmetries to eliminate redundant primitives. We propose a novel compression framework, $\textbf{\textit{SymGS}}$, introducing learnable mirrors into the scene, thereby eliminating local and global reflective redundancies for compression. Our framework functions as a plug-and-play enhancement to state-of-the-art compression methods, (e.g. HAC) to achieve further compression. Compared to HAC, we achieve $1.66 \times$ compression across benchmark datasets (upto $3\times$ on large-scale scenes). On an average, SymGS enables $\bf{108\times}$ compression of a 3DGS scene, while preserving rendering quality.

SymGS: Leveraging Reflective Symmetries for 3DGS Compression

The Internet of Things generates massive data streams, with edge computing emerging as a key enabler for online IoT applications and 5G networks. Edge solutions facilitate real-time machine learning inference, but also require continuous adaptation to concept drifts. While extensions of the Very Fast Decision Tree (VFDT) remain state-of-the-art for tabular stream mining, their unregulated growth limit efficiency, particularly in ensemble settings where post-pruning at the individual tree level is seldom applied. This paper presents DFDT, a novel memory-constrained algorithm for online learning. DFDT employs activity-aware pre-pruning, dynamically adjusting splitting criteria based on leaf node activity: low-activity nodes are deactivated to conserve resources, moderately active nodes split under stricter conditions, and highly active nodes leverage a skipping mechanism for accelerated growth. Additionally, adaptive grace periods and tie thresholds allow DFDT to modulate splitting decisions based on observed data variability, enhancing the accuracy–memory–runtime trade-off while minimizing the need for hyperparameter tuning. An ablation study reveals three DFDT variants suited to different resource profiles. Fully compatible with existing ensemble frameworks, DFDT provides a drop-in alternative to standard VFDT-based learners.

DFDT: Dynamic Fast Decision Tree for IoT Data Stream Mining on Edge Devices

Weight Quantization (WQ) is a key technique for lightweight Deep Neural Network (DNN) computations. While existing algorithms often pursue memory compression and inference acceleration with accuracy comparable to full-precision models, the effect of WQ on DNN uncertainty remains largely unexplored. In this paper, we quantify the impact of WQ on DNN uncertainty through the novel Exact Moment Propagation (EMP) uncertainty estimator. It is observed that WQ significantly increases DNN uncertainty. Based on the EMP estimator, we propose the MOMent Alignment (MOMA) to reduce WQ-induced uncertainty and preserve the accuracy of weight-quantized DNNs. Empirical results across various DNN architectures and datasets validate the effectiveness of both EMP and MOMA methods.

On the Impact of Weight Quantization on Deep Neural Network Uncertainty

We propose HiCL, a novel hippocampal-inspired dual-memory continual learning architecture designed to mitigate catastrophic forgetting by using elements inspired by the hippocampal circuitry. Our system encodes inputs through a grid-cell-like layer, followed by sparse pattern separation using a dentate gyrus-inspired module with top-$k$ sparsity. Episodic memory traces are maintained in a CA3-like autoassociative memory. Task-specific processing is dynamically managed via a DG-gated mixture-of-experts mechanism, wherein inputs are routed to experts based on cosine similarity between their normalized sparse DG representations and learned task-specific DG prototypes computed through online exponential moving averages. This biologically grounded yet mathematically principled gating strategy enables differentiable, scalable task-routing without relying on a separate gating network, and enhances the model's adaptability and efficiency in learning multiple sequential tasks. Cortical outputs are consolidated using Elastic Weight Consolidation weighted by inter-task similarity. Crucially, we incorporate prioritized replay of stored patterns to reinforce essential past experiences. Evaluations on standard continual learning benchmarks demonstrate the effectiveness of our architecture in reducing task interference, achieving near state-of-the-art results in continual learning tasks at lower computational costs.

Content not yet available

Next from AAAI 2026

CoGrad3D: Spatially-Coupled Timestep Optimization with Orthogonal Gradient Fusion for 3D Generation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES