Singapore

Large Reasoning Models (LRMs) have recently demonstrated impressive performance across a range of reasoning tasks by generating intermediate thoughts. However, these models can suffer from overthinking—generating excessive tokens that contribute little to final accuracy while increasing inference cost. To mitigate this, we propose TIV (Thought Injection via Vectors), an innovative framework that compresses token-level reasoning into compact vectors without sacrificing performance. Rather than generating explicit thoughts, TIV injects learnable vectors into the post-attention hidden states of the final token across Transformer layers, enabling implicit and lightweight reasoning. We further introduce a two-stage reinforcement learning strategy: the first stage calibrates the model&#39;s reasoning distribution, and the second distills it into a vector-based policy optimized for both accuracy and brevity. Experiments on three reasoning benchmarks show that TIV preserves over 99% of the original accuracy while reducing output length by more than 65% on average, reaching up to 80% in some cases. Moreover, TIV consistently achieves superior trade-offs between accuracy and efficiency compared to existing methods, distinguishing itself as a state-of-the-art (SOTA) approach for efficient reasoning in LRMs.

AAAI 2026

TIV: Thought Injection via Vectors for Efficient Reasoning in Large Reasoning Models

nlp: learning & optimization for nlp

ml: efficient ml / green ai

nlp: (large) language models

Large Reasoning Models (LRMs) have recently demonstrated impressive performance across a range of reasoning tasks by generating intermediate thoughts. However, these models can suffer from overthinking—generating excessive tokens that contribute little to final accuracy while increasing inference cost. To mitigate this, we propose TIV (Thought Injection via Vectors), an innovative framework that compresses token-level reasoning into compact vectors without sacrificing performance. Rather than generating explicit thoughts, TIV injects learnable vectors into the post-attention hidden states of the final token across Transformer layers, enabling implicit and lightweight reasoning. We further introduce a two-stage reinforcement learning strategy: the first stage calibrates the model's reasoning distribution, and the second distills it into a vector-based policy optimized for both accuracy and brevity. Experiments on three reasoning benchmarks show that TIV preserves over 99% of the original accuracy while reducing output length by more than 65% on average, reaching up to 80% in some cases. Moreover, TIV consistently achieves superior trade-offs between accuracy and efficiency compared to existing methods, distinguishing itself as a state-of-the-art (SOTA) approach for efficient reasoning in LRMs.

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Collage is a powerful medium for visual expression, traditionally demanding significant artistic expertise and manual effort. Existing methods often struggle with a trade-off between semantic expression and the visual fidelity of the constituent images. To address this, we introduce SCORE (Semantic Collage by Optimizing Rendered Elements), a novel text-driven framework that automates the creation of semantically rich and structurally sound collages. Our key innovation is to shift the optimization process entirely into the image space. By employing a differentiable renderer, we can backpropagate gradients from a powerful, pre-trained text-to-image model directly to the spatial parameters, including position, rotation, and scale, of each image element. We leverage Variational Score Distillation (VSD) to provide robust semantic guidance from a text prompt, ensuring the final layout aligns with the desired concept. Crucially, our ''minimal editing'' principle preserves the integrity of the original elements by forgoing any content-level modifications. The layout is refined by a joint loss function that combines the VSD-based semantic loss with structural regularizers that penalize overlap and enforce boundary constraints. The output of SCORE is a parametric, structured representation that allows further editing and downstream use. Our work reduces the barrier to creative expression and provides a new, powerful paradigm for organizing visual contents. Code and models will be made publicly available upon publication.

SCORE: Semantic Collage by Optimizing Rendered Elements

Diffusion models have shown great promise in data generation, yet generating time series data remains challenging due to the need to capture complex temporal dependencies and structural patterns. In this paper, we present TSGDiff, a novel framework that rethinks time series generation from a graph-based perspective. Specifically, we represent time series as dynamic graphs, where edges are constructed based on Fourier spectrum characteristics and temporal dependencies. A graph neural network-based encoder-decoder architecture is employed to construct a latent space, enabling the diffusion process to model the structural representation distribution of time series effectively. Furthermore, we propose the Topological Structure Fidelity (Topo-FID) score, a graph-aware metric for assessing the structural similarity of time series graph representations. Topo-FID integrates two sub-metrics: Graph Edit Similarity, which quantifies differences in adjacency matrices, and Structural Entropy Similarity, which evaluates the entropy of node degree distributions. This comprehensive metric provides a more accurate assessment of structural fidelity in generated time series. Experiments on real-world datasets demonstrate that TSGDiff generates high-quality synthetic time series data generation, faithfully preserving temporal dependencies and structural integrity, thereby advancing the field of synthetic time series generation.

TSGDiff: Rethinking Synthetic Time Series Generation from a Pure Graph Perspective

3D Gaussian Splatting (3DGS) has emerged as a leading framework for novel view synthesis, yet its core optimization challenges remain underexplored. We identify two key issues in 3DGS optimization: entrapment in suboptimal local optima and insufficient convergence quality. To address these, we propose Opt3DGS, a robust framework that enhances 3DGS through a two-stage optimization process of adaptive exploration and curvature-guided exploitation. In the exploration phase, an Adaptive Weighted Stochastic Gradient Langevin Dynamics (SGLD) method enhances global search to escape local optima. In the exploitation phase, a Local Quasi-Newton Direction-guided Adam optimizer leverages curvature information for precise and efficient convergence. Extensive experiments on diverse benchmark datasets demonstrate that Opt3DGS achieves state-of-the-art rendering quality by refining the 3DGS optimization process without modifying its underlying representation.

Opt3DGS: Optimizing 3D Gaussian Splatting with Adaptive Exploration and Curvature-Aware Exploitation

Open knowledge bases (e.g., websites) are widely adopted in Retrieval-Augmented Generation (RAG) systems to provide supplementary knowledge (e.g., latest information). 
However, such sources inevitably contain biased or harmful content, and incorporating these untrusted contents into the RAG process introduces significant safety risks, including the degradation of LLM performance and the potential generation of harmful outputs.
Recent studies have shown that this vulnerability can be further amplified by adversarial poisoning attacks specifically targeting the knowledge sources.
Most existing methods primarily emphasize improving the accuracy and efficiency of RAG systems, usually overlooking these critical safety concerns.
In this paper, we propose a safety-aware retrieval framework (ShieldRAG) designed to augment language model generation by jointly optimizing for both relevance and safety in the retrieved knowledge content.
The core idea of ShieldRAGis to transfer the safety knowledge implicitly encoded in powerful LLMs into the retriever model through an adversarial knowledge alignment mechanism.
This can empower the retriever with the safety awareness, and adapt to the diverse and unknown distribution of unsafe content encountered in practical scenarios.
We evaluate ShieldRAG on seven real-world datasets using five widely-used LLMs and two state-of-the-art poisoning attack strategies. 
Experimental results show that our method substantially improves the robustness of RAG systems against unsafe knowledge sources, while maintaining competitive performance in terms of generation accuracy and efficiency.

ShieldRAG: Safeguarding Retrieval-Augmented Generation from Untrusted Knowledge Bases

Outlier detection (OD) aims to identify abnormal instances, known as outliers or anomalies, by learning typical patterns of normal data, or inliers. Performing OD under an unsupervised regime--without any information about anomalous instances in the training data--is challenging. A recently observed phenomenon, known as the $\textit{inlier-memorization (IM) effect}$, where deep generative models (DGMs) tend to memorize inlier patterns during early training, provides a promising signal for distinguishing outliers. However, existing unsupervised approaches that rely solely on the IM effect still struggle when inliers and outliers are not well-separated or when outliers form dense clusters. To address these limitations, we incorporate $\textit{active learning}$ to selectively acquire informative labels, and propose $\textit{IMBoost}$, a novel framework that explicitly reinforces the IM effect to improve outlier detection. Our method consists of two stages: 1) a $\textit{warm-up}$ phase that induces and promotes the IM effect, and 2) a $\textit{polarization}$ phase in which actively queried samples are used to maximize the discrepancy between inlier and outlier scores. In particular, we propose a novel query strategy and tailored loss function in the polarization phase to effectively identify informative samples and fully leverage the limited labeling budget. We provide a theoretical analysis showing that the IMBoost consistently decreases inlier risk while increasing outlier risk throughout training, thereby amplifying their separation. Extensive experiments on diverse benchmark datasets demonstrate that IMBoost not only significantly outperforms state-of-the-art active OD methods but also requires substantially less computational cost.

Memorize Early, Then Query: Inlier-Memorization-Guided Active Outlier Detection

Analogical reasoning is at the core of human cognition, serving as an important foundation for a variety of intellectual activities. While prior work has shown that LLMs can represent task patterns and surface-level concepts, it remains unclear whether these models can encode high-level relational concepts and apply them to novel situations through structured comparisons. In this work, we explore this fundamental aspect using proportional and story analogies, and identify three key findings. First, LLMs effectively encode the underlying relationships between analogous entities; both attributive and relational information propagate through mid-upper layers in correct cases, whereas reasoning failures reflect missing relational information within these layers. Second, unlike humans, LLMs often struggle not only when relational information is missing, but also when attempting to apply it to new entities. In such cases, strategically patching hidden representations at critical token positions can facilitate information transfer to a certain extent. Lastly, successful analogical reasoning in LLMs is marked by strong structural alignment between analogous situations, whereas failures often reflect degraded or misplaced alignment. Overall, our findings reveal that LLMs exhibit emerging but limited capabilities in encoding and applying high-level relational concepts, highlighting both parallels and gaps with human cognition.

The Curious Case of Analogies: Investigating Analogical Reasoning in Large Language Models

While Vision-Language Models (VLMs) have achieved notable progress in computational pathology (CPath), the gigapixel scale and spatial heterogeneity of Whole Slide Images (WSIs) continue to pose challenges for multimodal understanding. Existing alignment methods struggle to capture fine-grained correspondences between textual descriptions and visual cues across thousands of patches from a slide, compromising their performance on downstream tasks. In this paper, we propose PathFLIP ($\textbf{Path}$ology $\textbf{F}$ine-grained $\textbf{L}$anguage-$\textbf{I}$mage $\textbf{P}$retraining), a novel framework for holistic WSI interpretation. PathFLIP decomposes slide-level captions into region-level sub-captions and generates text-conditioned region embeddings to facilitate precise visual-language grounding. By harnessing Large Language Models (LLMs), PathFLIP can seamlessly follow diverse clinical instructions and adapt to varied diagnostic contexts. Furthermore, it exhibits versatile capabilities across multiple paradigms, efficiently handling slide-level classification and retrieval, fine-grained lesion localization, and instruction following. Extensive experiments demonstrate that PathFLIP outperforms existing large-scale pathological VLMs on four representative benchmarks while requiring significantly less training data, paving the way for fine-grained, instruction-aware WSI interpretation in research and clinical practice.

PathFLIP: Fine-grained Language-Image Pretraining for Versatile Computational Pathology

Sequential recommendation models analyze user historical behavior sequences to capture temporal dependencies and the dynamic evolution of interests, enabling accurate predictions of future behaviors. However, there are still two critical challenges that remain unsolved: i) Inadequate temporal modeling of user intent, which fails to distinguish between global intent tendency and temporal contextual intent. ii) Noise in sequential interaction data may introduce bias into the model. To address these issues, we propose a Self-Supervised Hypergraph Sequential Recommendation Framework (S$^2$HyRec). This framework features the Global Intent Tendency module for capturing long-term preferences, the Temporal Contextual Intent module for modeling dynamic time-sensitive interests. Additionally, we develop the Sequence Dependency-Aware module that analyzes the chronological flow of interactions to uncover inherent behavioral dynamics, further enriching the comprehensive user intent representation. To mitigate noisy interactions, we employ a Cross-View Self-Supervised Learning module that enhances the model's ability to distinguish genuine preferences from noise. Extensive experiments on four benchmark datasets demonstrate the superiority of S$^2$HyRec over various state-of-the-art recommendation methods, especially achieving average improvements of 15.13\% and 14.03\% in NDCG@10 and NDCG@20, respectively, across the four datasets. The code is provided in the Appendix.

S²HyRec: Self-Supervised Hypergraph Sequential Recommendation

Vision-language models (VLMs) pre-trained on natural image and language data, such as CLIP, have exhibited significant potential in few-shot image recognition tasks, leading to development of various efficient transfer learning methods. These methods exploit inherent pre-learned knowledge in VLMs and have achieved strong performance on standard image datasets. However, their effectiveness is often limited when confronted with cross-domain tasks where imaging domains differ from natural images. To address this limitation, we propose Consistency-guided Multi-view Collaborative Optimization (CoMuCo), a novel fine-tuning strategy for VLMs. This strategy employs two functionally complementary expert modules to extract multi-view features, while incorporating prior knowledge-based consistency constraints and information geometry-based consensus mechanisms to enhance the robustness of feature learning. Additionally, a new cross-domain few-shot benchmark is established to help comprehensively evaluate methods on imaging domains distinct from natural images. Extensive empirical evaluations on both existing and newly proposed benchmarks suggest CoMuCo consistently outperforms current methods in few-shot tasks. The code and benchmark are available at https://github.com/kaderxon/CoMuCo.

Cross-Domain Few-Shot Learning via Multi-View Collaborative Optimization with Vision-Language Models

We address the task of universal compressed image restoration, which involves recovering high-quality images degraded by a wide range of codecs and compression levels. While prior methods have made significant progress, they typically target specific degradation types and struggle to generalize across both traditional and learning-based codecs. To overcome this limitation, we propose a unified framework that leverages codec-aware conditioning and reinforcement learning-based fine-tuning. Specifically, we introduce a conditioning module that encodes both codec type and compression level, enabling the restoration network to adapt its behavior to diverse degradation settings. To further improve generalization, we incorporate reward-based objectives during fine-tuning, providing complementary signals that enhance training across both conventional and learned compression schemes. Experimental results demonstrate the effectiveness of our method in restoring images across a wide range of compression artifacts and scenarios.

Content not yet available

Next from AAAI 2026

SCORE: Semantic Collage by Optimizing Rendered Elements

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES