Singapore

Reinforcement learning (RL) has shown significant promise in sequential portfolio optimization. A typical solution involves optimizing cumulative returns using historical offline data. However, it may produce less generalizable policies that merely &#39;&#39;memorize&#39;&#39; optimal buying and selling actions from the offline data while neglecting the non-stationary nature of the financial market. We frame portfolio optimization of stock data as a specific type of offline RL problem. Our method, MetaTrader, presents two key contributions. First, it introduces a novel bilevel RL algorithm that operates on both the original stock data and its transformations. The core idea is that a robust policy should generalize effectively to out-of-distribution data. Second, we propose a new temporal difference (TD) method that leverages a transformation-based conservative TD target to address value overestimation under limited offline data. Empirical results on two publicly available datasets demonstrate that MetaTrader outperforms existing methods, including both traditional stock prediction models and RL-based trading approaches.

AAAI 2026

MetaTrader: Learning to Generalize RL Trading Policies Beyond Offline Data

portfolio optimization

meta learning

reinforcement learning

Reinforcement learning (RL) has shown significant promise in sequential portfolio optimization. A typical solution involves optimizing cumulative returns using historical offline data. However, it may produce less generalizable policies that merely ''memorize'' optimal buying and selling actions from the offline data while neglecting the non-stationary nature of the financial market. We frame portfolio optimization of stock data as a specific type of offline RL problem. Our method, MetaTrader, presents two key contributions. First, it introduces a novel bilevel RL algorithm that operates on both the original stock data and its transformations. The core idea is that a robust policy should generalize effectively to out-of-distribution data. Second, we propose a new temporal difference (TD) method that leverages a transformation-based conservative TD target to address value overestimation under limited offline data. Empirical results on two publicly available datasets demonstrate that MetaTrader outperforms existing methods, including both traditional stock prediction models and RL-based trading approaches.

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Inference-time scaling has emerged as a powerful technique for enhancing the reasoning performance of Large Language Models (LLMs). However, existing approaches often rely on heuristic strategies for parallel sampling, lacking a principled foundation. To address this gap, we propose a probabilistic framework that formalizes the optimality of inference-time scaling under the assumption that parallel samples are independently and identically distributed (i.i.d.), and where the Best-of-N selection strategy follows a probability distribution that can be estimated. Within this framework, we derive a theoretical lower bound on the required number of samples to achieve a target performance level, providing the first principled guidance for compute-efficient scaling. Leveraging this insight, we develop OptScale, a practical algorithm that dynamically determines the optimal number of sampled responses. OptScale employs a language model-based predictor to estimate probabilistic prior parameters, enabling the decision of the minimal number of samples needed that satisfy predefined performance thresholds and confidence levels. Extensive experiments on mathematical reasoning benchmarks (including MATH-500, GSM8K, AIME, and AMC) demonstrate that OptScale significantly reduces sampling overhead while remaining better or on par with state-of-the-art reasoning performance. Our work offers both a theoretical foundation and a practical solution for principled inference-time scaling, addressing a critical gap in the efficient deployment of LLMs for complex reasoning. The source code will be open upon acceptance at \url{https://open\_upon\_acceptance}.

OptScale: Probabilistic Optimality for Inference-time Scaling

Without manual annotations, unsupervised cross-modal hashing (UCMH) aims to achieve efficient clustering and retrieval by leveraging data interrelationships. However, the retrieval accuracy is constrained by two main aspects: 1) insufficient exploration of data relationships; 2) existing knowledge mining strategies are not well aligned with the architectural properties of multilayer perceptrons. Through summary and error analysis, the human brain is able to achieve fast learning through experience and minimal data. Inspired by this cognitive process, we propose a novel Error Notebook strategy, named ENHash, to more effectively capture similarity information between multi-modal data for fine-grained unsupervised clustering. Firstly, simulating the human process of summarizing experiences, ENHash gradually integrates the information from each batch into a global clustering representation. Secondly, drawing upon human error analysis capabilities, ENHash utilizes the summarized experiences to identify and record incorrectly predicted hash codes. Finally, by leveraging the knowledge derived from this analysis, ENHash guides the hash function to learn fine-grained patterns from the errors. To the best of our knowledge, ENHash represents the first attempt at integrating cognitively-inspired mechanisms into fine-grained UCMH optimization paradigms. We evaluate the proposed ENHash against eight state-of-the-art methods on three widely used datasets and one fine-grained cross-modal dataset. Experimental results show that ENHash achieves substantial improvements over existing approaches. To support reproducibility, the experimental code has been uploaded to the following anonymous repository: https://osf.io/tbehv/?view_only=e4470e710bdf411589391807a2914218.

ENHash: Error Notebook-Guided Fine-Grained Learning for Unsupervised Cross-Modal Hashing

Sequential knowledge editing techniques aim to continuously update knowledge in large language models at low cost, preventing models from generating outdated or incorrect information. However, existing sequential editing methods suffer from a significant decline in editing success rates after long-term editing. Through theoretical analysis and experiments, our findings reveal that as the number of edits increases, the model's output increasingly deviates from the desired target, leading to a drop in editing success rates. We refer to this issue as the **superimposed noise accumulation problem**. Our further analysis demonstrates that the problem is related to the erroneous activation of irrelevant knowledge and conflicts between activated knowledge. Based on this analysis, a method named **DeltaEdit** is proposed that reduces conflicts between knowledge through dynamic orthogonal constraint strategies. Experiments show that DeltaEdit significantly reduces superimposed noise, achieving a 16.8% improvement in editing performance over the strongest baseline.

On the Superimposed Noise Accumulation Problem in Sequential Knowledge Editing of Large Language Models

Long-form books are among the most information-rich and structurally complex forms of written content, often exceeding 100{,}000 words. While recent methods have enabled basic long-text generation, they remain limited in two key aspects: the inability to generate ultra-long content at book scale, and the lack of mechanisms for integrating rich factual information. To address these limitations, we propose DeepWriter, a multi-agent collaborative framework that follows a structured planning-then-generation paradigm. It first constructs a detailed book outline with narrative arcs and chapter semantics, then incrementally generates content conditioned on retrieved knowledge and contextual signals. DeepWriter supports controllable generation of full-length books exceeding 100{,}000 words, enriched with citations, trivia and images. To support evaluation beyond surface-level fluency, we introduce DeepWriter-Bench, a bilingual benchmark of 18 annotated books designed to assess book-scale coherence, richness, and factual grounding. Additionally, we propose BookScore, a unified 100-point metric for quantifying book maturity. Experimental results show that DeepWriter achieves a state-of-the-art BookScore of 80.23, consistently outperforming strong baselines. Code and resources are available to support future research.

DeepWriter: A Multi-Agent Collaboration Framework for Information-rich Ultra-long Book Writing

Multi-model fitting is fundamental for robust geometric estimation in computer vision. However, recent deep learning methods enable parallel model detection but rely on simple architectures that inadequately model spatial relationships. Moreover, current methods typically generate hypotheses only through minimal solvers on randomly sampled points, thus failing to explore the full diversity of the solution space. To address these limitations, we propose a novel Jacobian-based Gaussian uncertainty modeling framework, which analytically propagates covariance through geometric transformations and enables efficient expansion of the hypothesis space with strong theoretical guarantees. We further introduce a Gaussian Hypothesis Generation Network (GHG-Net) to learn global parameter distributions, enabling the generation of diverse and geometrically valid hypotheses. Additionally, our network captures spatial relationships among observations by employing a dynamic graph neural network with a multi-head attention mechanism. This yields more accurate sample and inlier weights, significantly improving the quality of hypothesis generation. Extensive experiments on three representative geometric estimation tasks (i.e. vanishing point detection, fundamental matrix estimation, and homography estimation) demonstrate that our method achieves new state-of-the-art accuracy and stability, while maintaining high computational efficiency.

Gaussian Uncertainty-Driven Multi-Model Fitting with Graph Neural Network

The method used to measure relationships between face embeddings plays a crucial role in determining the performance of face clustering. Existing methods employ the Jaccard similarity coefficient instead of the traditional cosine distance to enhance the measurement accuracy. However, these methods introduce an excessive number of irrelevant nodes, producing Jaccard coefficients with limited discriminative power and adversely affecting clustering performance. To address this issue, we propose a prediction-driven Top-K Jaccard similarity coefficient that enhances the purity of neighboring nodes, thereby improving the reliability of similarity measurements. Nevertheless, accurately predicting the optimal number of neighbors (Top-K) remains challenging, leading to suboptimal clustering results. To overcome this limitation, we develop a Transformer-based prediction model that examines the relationships between the central node and its neighboring nodes near the Top-K to further enhance the reliability of similarity estimation. However, vanilla Transformer, when applied to predict relationships between nodes, often introduces noise due to their overemphasis on irrelevant feature relationships. To address these challenges, we propose a Sparse Differential Transformer (SDT), instead of the vanilla Transformer, to eliminate noise and enhance the model's anti-noise capabilities. Extensive experiments on multiple datasets, such as MS-Celeb-1M, demonstrate that our approach achieves state-of-the-art (SOTA) performance, outperforming existing methods and providing a more robust solution for face clustering.

Enhancing Noise Resilience in Face Clustering via Sparse Differential Transformer

Legal dispute mediation plays a crucial role in resolving civil disputes, yet its empirical study is limited by privacy constraints and complex multivariate interactions. To address this limitation, we present AgentMediation, the first LLM-based agent framework for simulating dispute mediation. It simulates realistic mediation processes grounded in real-world disputes and enables controlled experimentation on key variables such as disputant strategies, dispute causes, and mediator expertise. Our empirical analysis reveals patterns consistent with sociological theories, including Group Polarization and Surface-level Consensus. As a comprehensive and extensible platform, AgentMediation paves the way for deeper integration of social science and AI in legal research.

Simulating Dispute Mediation with LLM-Based Agents for Legal Research

Ad hoc teamwork (AHT) requires agents to collaborate with previously unseen teammates, which is crucial for many real-world applications. The core challenge of AHT is to develop an ego agent that can predict and adapt to unknown teammates on the fly. Conventional RL-based approaches optimize a single expected return, which often causes policies to collapse into a single dominant behavior, thus failing to capture the multimodal cooperation patterns inherent in AHT. In this work, we introduce PADiff, a diffusion-based approach that captures agent's multimodal behaviors, unlocking its diverse cooperation modes with teammates. However, standard diffusion models lack the ability to predict and adapt in non-stationary AHT scenarios. To address this limitation, we propose a novel diffusion-based policy that integrates critical predictive information about teammates into the denoising process. Extensive experiments across three environments demonstrate that PADiff outperforms existing AHT methods significantly.

PADiff: Predictive and Adaptive Diffusion Policies for Ad Hoc Teamwork

Real-time, high-fidelity monocular depth estimation from remote sensing imagery is crucial for numerous applications, yet existing methods face a stark trade-off between accuracy and efficiency. Although using Vision Transformer (ViT) backbones for dense prediction is fast, they often exhibit poor perceptual quality. Conversely, diffusion models offer high fidelity but at a prohibitive computational cost. To overcome these limitations, we propose Depth Detail Diffusion for Remote Sensing Monocular Depth Estimation ($D^3$-RSMDE), an efficient framework designed to achieve an optimal balance between speed and quality. Our framework first leverages a ViT-based module to rapidly generate a high-quality preliminary depth map construction, which serves as a structural prior, effectively replacing the time-consuming initial structure generation stage of diffusion models. Based on this prior, we propose a Progressive Linear Blending R}efinement (PLBR) strategy, which uses a lightweight U-Net to refine the details in only a few iterations. The entire refinement step operates efficiently in a compact latent space supported by a Variational Autoencoder (VAE). Extensive experiments demonstrate that $D^3$-RSMDE achieves a notable 11.85\% reduction in the Learned Perceptual Image Patch Similarity (LPIPS) perceptual metric over leading models like Marigold, while also achieving over a 40$\times$ speedup in inference and maintaining VRAM usage comparable to lightweight ViT models. Our project is available at https://anonymous.4open.science/r/D3RSMDE-5547.

D3-RSMDE: 40× Faster and High-Fidelity Remote Sensing Monocular Depth Estimation

Universal visual anomaly detection aims to identify anomalies from novel or unseen vision domains without additional fine-tuning, which is critical in open scenarios. Recent studies have demonstrated that pre-trained vision-language models like CLIP exhibit strong generalization with just zero or a few normal images. However, existing methods struggle with designing prompt templates, complex token interactions, or requiring fine-tuning on target domains, resulting in limited flexibility. In this work, we present a simple yet effective AdaptCLIP based on two key insights. First, adaptive visual and textual representations should be learned alternately rather than jointly. Second, comparative learning between query and normal image prompt should incorporate both contextual and aligned residual features, rather than relying solely on residual features. AdaptCLIP treats CLIP models as a foundational service, adding only three simple adapters, visual adapter, textual adapter, and prompt-query adapter, at its input or output ends. AdaptCLIP supports zero-/few-shot generalization across domains and possesses a training-free manner on target domains once trained on a base dataset. AdaptCLIP achieves state-of-the-art performance on 12 anomaly detection benchmarks from industrial and medical domains, significantly outperforming existing competitive methods. We will make the code and model of AdaptCLIP available.

Content not yet available

Next from AAAI 2026

OptScale: Probabilistic Optimality for Inference-time Scaling

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES