Singapore

In-context learning-based medical segmentation (ICLM) enables foundation models to generalize to unseen cases without retraining. To enhance performance on test queries, existing methods typically follow a two-stage process: (1) using a retrieval encoder (RE) to map both queries and training samples into a shared feature space, and (2) retrieving and utilizing the top-$k$ most similar training samples. While current methods fix the RE and focus on optimizing stage (2), we show that the choice of RE in stage (1) alone can account for over 70% of the performance variation, highlighting RE selection as a critical yet often overlooked factor in ICLM.
In this paper, we conduct an analysis of the RE selection and make two main findings: (1) dynamically selecting the RE for each query outperforms selecting a fixed RE for the entire task; and (2) feature-space heuristics (e.g., intra-class compactness and inter-class separability) fail to predict RE quality. To this end, we propose the \emph{instance-adaptive retrieval encoder selection} (IRES) method that can select the optimal RE for each query based on output predictions. IRES is based on the intuition that a good RE retrieves relevant demonstrations, helping the ICL model generate more accurate and stable segmentation masks. Thus, we introduce the shape stability score (S$^3$), which evaluates the morphological stability of predicted masks under iterative erosion. Experiments show S$^3$ correlates strongly with true RE quality (Pearson $&gt;$ 0.8), serving as a reliable selection proxy. To reduce S$^3$’s per-query cost, we propose parallel prediction with reciprocal neighbor reuse (P2R), which accelerates inference by parallelizing encoding and reusing encoder selections across reciprocal neighbors, avoiding redundant computation. Built on S$^3$ and P2R, IRES improves ICLM performance across FUNDUS, Brain MRI, and Chest X-ray datasets, with up to 10.6% gain on fundus segmentation.

AAAI 2026

Retriever Encoder Selection Matters for In-Context Learning-based Medical Segmentation

medical segmentation

in-context learning

model selection

In-context learning-based medical segmentation (ICLM) enables foundation models to generalize to unseen cases without retraining. To enhance performance on test queries, existing methods typically follow a two-stage process: (1) using a retrieval encoder (RE) to map both queries and training samples into a shared feature space, and (2) retrieving and utilizing the top-$k$ most similar training samples. While current methods fix the RE and focus on optimizing stage (2), we show that the choice of RE in stage (1) alone can account for over 70% of the performance variation, highlighting RE selection as a critical yet often overlooked factor in ICLM.
In this paper, we conduct an analysis of the RE selection and make two main findings: (1) dynamically selecting the RE for each query outperforms selecting a fixed RE for the entire task; and (2) feature-space heuristics (e.g., intra-class compactness and inter-class separability) fail to predict RE quality. To this end, we propose the \emph{instance-adaptive retrieval encoder selection} (IRES) method that can select the optimal RE for each query based on output predictions. IRES is based on the intuition that a good RE retrieves relevant demonstrations, helping the ICL model generate more accurate and stable segmentation masks. Thus, we introduce the shape stability score (S$^3$), which evaluates the morphological stability of predicted masks under iterative erosion. Experiments show S$^3$ correlates strongly with true RE quality (Pearson $>$ 0.8), serving as a reliable selection proxy. To reduce S$^3$’s per-query cost, we propose parallel prediction with reciprocal neighbor reuse (P2R), which accelerates inference by parallelizing encoding and reusing encoder selections across reciprocal neighbors, avoiding redundant computation. Built on S$^3$ and P2R, IRES improves ICLM performance across FUNDUS, Brain MRI, and Chest X-ray datasets, with up to 10.6% gain on fundus segmentation.

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

In goal-oriented dialogue tasks, the main challenge is to steer the interaction towards a given goal within a limited number of turns. Existing approaches either rely on elaborate prompt engineering, whose effectiveness is heavily dependent on human experience, or integrate policy networks and pre-trained policy models, which are usually difficult to adapt to new dialogue scenarios and costly to train. Therefore, in this paper, we present Nested Rollout Policy Adaptation for Goal-oriented Dialogue (NRPA-GD), a novel dialogue policy planning method that completely avoids specific model training by utilizing a Large Language Model~(LLM) to simulate behaviors of user and system at the same time. Specifically, NRPA-GD constructs a complete evaluation mechanism for dialogue trajectories and employs an optimization framework of nested Monte Carlo simulation and policy self-adaptation to dynamically adjust policies during the dialogue process. The experimental results on four typical goal-oriented dialogue datasets show that NRPA-GD outperforms both existing prompt engineering and specifically pre-trained model-based methods. Impressively, NRPA-GD surpasses ChatGPT and pre-trained policy models with only a 0.6-billion-parameter LLM. The proposed approach further demonstrates the advantages and novelty of employing planning methods on LLMs to solve practical planning tasks.

A General Highly Accurate Online Planning Method Integrating Large Language Models into Nested Rollout Policy Adaptation for Dialogue Tasks

Geological CO$_2$ storage (GCS) involves injecting captured CO$_2$ into deep subsurface formations to support climate goals. The effective management of GCS relies on adaptive injection planning to dynamically control injection rates and well pressures to balance both storage safety and efficiency. Prior literature, including numerical optimization methods and surrogate-optimization methods, is limited by real-world GCS requirements of smooth state transitions and goal-directed planning within limited time. To address these limitations, we propose a Brownian Bridge–augmented framework for surrogate simulation and injection planning in GCS and develop two insights (i) Brownian bridge as smooth state regularizer for better surrogate simulator; (ii) Brownian bridge as goal-time-conditioned planning guidance for better injection planning. Our method has three stages: (i) learning deep Brownian bridge representations with contrastive and reconstructive losses from historical reservoir and utility trajectories, (ii) incorporating Brownian bridge-based next state interpolation for simulator regularization (iii) guiding injection planning with Brownian utility-conditioned trajectories to generate high-quality injection plans. Experimental results across multiple datasets collected from diverse GCS settings demonstrate that our framework consistently improves simulation fidelity and planning effectiveness while maintaining low computational overhead.

Brownian Bridge Augmented Surrogate Simulation and Injection Planning for Geological CO2 Storage

The potential of Generalized Category Discovery (GCD) lies in its ability to identify previously undiscovered patterns in both labeled and unlabeled data by leveraging insights from partially labeled training samples. However, interference can arise due to the model's dual focus on discovering both novel and known categories, often leading to conflicts that obscure true patterns in the dataset. This paper presents a divide-and-conquer framework, Foundation-Adaptive Integrated Refinement (FAIR), which fine-tunes pretrained foundational weights for various purposes, divided into $\texttt{Foundation}$ (pretrained weights), $\texttt{Adaptive}$ (weights fine-tuned with an anticlimactic loss), and $\texttt{Integrated}$ (weights adjusted for both labeled and unlabeled data). The $\texttt{Adaptive}$ utilizes a newly proposed adaptive contrastive loss that introduces variances within classes to preserve the individuality of representations. The $\texttt{Integrated}$ addresses inherent estimation errors while dynamically estimating the number of categories, incorporating a cosine-based perturbation mechanism as a margin to ensure the ground truth falls within it, rather than relying on biased estimates. 
Extensive experiments on six benchmark datasets demonstrate our method's effectiveness, outperforming state-of-the-art algorithms, especially on fine-grained datasets.

Foundation-Adaptive Integrated Refinement for Generalized Category Discovery

Large Reasoning Models (LRMs) have advanced automated multi-step reasoning, but their ability to generate complex Chain-of-Thought (CoT) trajectories introduces severe privacy risks, as sensitive information may be deeply embedded throughout the reasoning process. Existing Large Language Models (LLMs) unlearning approaches that typically focus on modifying only final answers are insufficient for LRMs, as they fail to remove sensitive content from intermediate steps, leading to persistent privacy leakage and degraded security. To address these challenges, we propose Sensitive Trajectory Regulation (STaR), a parameter-free, inference-time unlearning framework that achieves robust privacy protection throughout the reasoning process. Specifically, we first identify sensitive content via semantic-aware detection. Then, we inject global safety constraints through secure prompt encoder. Next, we perform trajectory-aware suppression to dynamically block sensitive content across the entire reasoning chain. Finally, we apply token-level adaptive filtering to prevent both exact and paraphrased sensitive tokens during generation. Furthermore, to overcome the inadequacies of existing evaluation protocols, we introduce two metrics: Multi-Decoding Consistency Assessment (MCS), which measures the consistency of unlearning across diverse decoding strategies, and Multi-Granularity Membership Inference Attack (MIA) Evaluation, which quantifies privacy protection at both answer and reasoning-chain levels. Experiments on the R-TOFU benchmark demonstrate that STaR achieves comprehensive and stable unlearning with minimal utility loss, setting a new standard for privacy-preserving reasoning in LRMs.

STaR: Sensitive Trajectory Regulation for Unlearning in Large Reasoning Models

Multi-modal entity alignment aims to identify equivalent entities across different multi-modal knowledge graphs (MMKGs). 
While prior work has achieved notable progress through improved multi-modal encoding and cross-modal fusion techniques, two critical challenges remain unresolved. 
First, due to the heterogeneous and often inconsistent sources from which MMKGs are constructed, the quality and informativeness of modalities vary significantly across entities, leading to the modality weighting problem. 
Second, existing cross-modal fusion mechanisms predominantly emphasize modality-shared information, often at the expense of modality-specific signals that are also essential for precise alignment.
To address these issues, we propose \emph{HUMEA}, a novel framework that integrates hierarchical Mixture-of-Experts (MoE) with unimodal distillation. 
HUMEA consists of: 
(1) A Hierarchical MoE module comprising intra-modal and inter-modal experts, which adaptively modulates modality contributions by capturing entity representations at fine-to-coarse semantic granularities. 
In addition, we introduce a contrastive mutual information loss to enhance expert diversity and reduce redundancy. 
(2) A unimodal distillation strategy that preserves modality-specific information in the fused representations through single-modality alignment and distillation, achieving a balanced integration of shared and unique modality features.
Extensive experiments on two benchmark datasets, FB15K-DB15K and FB15K-YAGO15K, have achieved the state-of-the-art results, validating the effectiveness of our approach.

On Modality Weighting and Specificity for Multi-Modal Entity Alignment

In recent years, gig platforms like Uber and DoorDash have implemented strategies to boost gig drivers' earnings during peak hours. Uber's 'back-to-back' feature allows drivers to accept new trips while still on route, and Uber Eats' 'Batch Order Route' initiative allows drivers to pick up multiple deliveries from different locations, which may result in multiple tops before one order is delivered. Despite revenue gains, these features lead to user complaints about extended waiting times. In response, platforms introduce features like Uber Eats' 'Priority Delivery' and Uber's 'Priority', where customers pay an extra subscription fee for guaranteed reduced waiting times.
This paper focuses on designing matching policies to enhance system revenue while limiting customer waiting times. We present a hybrid model combining online matching and queue theory for quantitative analysis of users' waiting times. Additionally, we introduce an LP-based sampling framework and a unified queue-theory-based method for evaluating online performance. Comprehensive experiments on real datasets validate our theoretical findings, highlighting the efficiency of our matching framework in promoting profit and meeting committed waiting times.

Matching Policy Design for Gig Platforms with “Priority” Features

Video-to-music (V2M) generation aims to create music that aligns with visual content. However, two main challenges persist in existing methods: (1) the lack of explicit rhythm modeling hinders audiovisual temporal alignments; (2) effectively integrating various visual features to condition music generation remains non-trivial. To address these issues, we propose Diff-V2M, a general V2M framework based on a hierarchical conditional diffusion model, comprising two core components: visual feature extraction and conditional music generation. For rhythm modeling, we begin by evaluating several rhythmic representations, including low-resolution mel-spectrograms, tempograms, and onset detection functions (ODF), and devise a rhythmic predictor to infer them directly from videos. To ensure contextual and affective coherence, we also extract semantic and emotional features. All features are incorporated into the generator via a hierarchical cross-attention mechanism, where emotional features shape the affective tone via the first layer, while semantic and rhythmic features are fused in the second cross-attention layer. To enhance feature integration, we introduce timestep-aware fusion strategies, including feature-wise linear modulation (FiLM) and weighted fusion, allowing the model to adaptively balance semantic and rhythmic cues throughout the diffusion process. Extensive experiments identify low-resolution ODF as a more effective signal for modeling musical rhythm and demonstrate that Diff-V2M outperforms existing models on both in-domain and out-of-domain datasets, achieving state-of-the-art performance in terms of objective metrics and subjective comparisons. Demo and code are available at https://Tayjsl97.github.io/Diff-V2M-Demo/.

Diff-V2M: A Hierarchical Conditional Diffusion Model with Explicit Rhythmic Modeling for Video-to-Music Generation

Face videos accompanied by audio have become integral to our daily lives, while they often suffer from complex degradations. Most face video restoration methods neglect the intrinsic correlations between visual and audio features, particularly in the mouth region. Several audio-aided face video restoration methods have been proposed, but they only focus on compression artifact removal. In this paper, we propose a General Audio-assisted face Video restoration Network (GAVN) to address various types of streaming video distortions via identity and temporal complementary learning. Specifically, GAVN first captures inter-frame temporal features in the low-resolution space to restore frames coarsely and save computational cost. Then, GAVN extracts intra-frame identity features in the high-resolution space with the assistance of audio signals and face landmarks to restore more facial details. Finally, the reconstruction module integrates temporal features and identity features to generate high-quality face videos. Experimental results demonstrate that GAVN outperforms the existing state-of-the-art methods on face video compression artifact removal, deblurring, and super-resolution. Codes will be released upon publication.

Audio-Assisted Face Video Restoration with Temporal and Identity Complementary Learning

A novel learning-optimization-combined 4D radar odometry model, named DNOI-4DRO, is proposed in this paper. The proposed model seamlessly integrates traditional geometric optimization with end-to-end neural network training, leveraging an innovative differentiable neural-optimization iteration operator. 
In this framework, point-wise motion flow is first estimated using a neural network, followed by the construction of a cost function based on the relationship between point motion and pose in 3D space. The radar pose is then refined using Gauss-Newton updates. 
Additionally, we design a dual-stream 4D radar backbone that integrates multi-scale geometric features and clustering-based class-aware features to enhance the representation of sparse 4D radar point clouds. 
Extensive experiments on the VoD and Snail-Radar datasets demonstrate the superior performance of our model, which outperforms recent classical and learning-based approaches. Notably, our method even achieves results comparable to A-LOAM with mapping optimization using LiDAR point clouds as input.
Our models and code will be publicly released.

DNOI-4DRO: Deep 4D Radar Odometry with Differentiable Neural-Optimization Iterations

We introduce a novel framework for privacy-preserving multi-party neural network training over $\mathbb{Z}_{2^k}$ with semi-honest security in the honest-majority setting. Our work utilizes Shamir secret sharing scheme over Galois rings $GR(2^k, d)$ and is scalable in the number of participants. Our primary contribution is a generalization of existing data packing techniques used in private training through Reverse Multiplication-Friendly Embedding (RMFE), which enables a higher packing density and thus more efficient SIMD-style parallel computation. Notably, our work is the first to support a general form of RMFE, lifting a common restriction from previous approaches. To holistically optimize the training process, we further integrate mixed-circuit techniques to be fully compatible with our RMFE-based packing scheme. This enables our protocol to efficiently compute nonlinear functions, such as comparison, by leveraging bit-wise computations over $GR(2, d)$. We consolidate these advances into an end-to-end parallel training framework. Experimental results on both fully connected and convolutional neural networks validate the practical performance advantages of our framework compared to existing methods.

Downloads

Next from AAAI 2026

A General Highly Accurate Online Planning Method Integrating Large Language Models into Nested Rollout Policy Adaptation for Dialogue Tasks

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

A General Highly Accurate Online Planning Method Integrating Large Language Models into Nested Rollout Policy Adaptation for Dialogue Tasks

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads