Singapore

Fully fine-tuning large pre-trained models for each downstream task is impractical due to prohibitive memory, computation, and storage costs. Although parameter-efficient fine-tuning (PEFT) methods address this issue, leading methods like LoRA still exhibit linear scaling of trainable parameters with hidden size. Recent studies have explored PEFT in the frequency domain to reduce computational costs by employing fast Fourier transform and discrete cosine transform with sparse frequency selection. These methods rely on global frequency representations that lack spatial locality and disperse energy across the domain. As a result, sparse coefficient selection struggles to preserve fine-grained structural information and often introduces artifacts such as ringing near boundaries. To address these limitations, we propose DWTSG, a novel PEFT framework based on discrete wavelet transform (DWT) and subband guidance. DWTSG decomposes pre-trained weights into four wavelet subbands that jointly encode global context and local details. It fine-tunes only the most informative coefficients in each subband through an energy-based selection strategy that prioritizes coefficients based on their individual importance and interactions. Finally, inverse DWT reconstructs the updated weights, enabling efficient and precise adaptation. Extensive experiments on natural language understanding, commonsense reasoning, and image classification demonstrate that DWTSG outperforms existing PEFT methods, achieving superior performance and higher parameter efficiency.

AAAI 2026

DWTSG: Parameter-Efficient Fine-Tuning of Large Pre-trained Models via Discrete Wavelet Transform and Subband Guidance

parameter-efficient fine-tuning; large pre-trained models; discrete wavelet transform

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Graph-based Retrieval-Augmented Generation (GraphRAG) mitigates hallucinations in Large Language Models (LLMs) by grounding them in structured knowledge. However, current GraphRAG methods are constrained by a prevailing \textit{build-then-reason} paradigm, which relies on a static, pre-constructed Knowledge Graph (KG). This paradigm faces two critical challenges. First, the KG's inherent incompleteness often breaks reasoning paths. Second, the graph’s low signal-to-noise ratio introduces distractor facts, presenting query-relevant but misleading knowledge that derails the reasoning process.
To address these challenges, we argue for a \textit{reason-and-construct} paradigm and propose Relink, a framework that dynamically builds a query-specific evidence graph. To tackle incompleteness, \textbf{Relink} instantiates required facts from a latent relation pool derived from the original text corpus, repairing broken paths on the fly. To handle misleading or distractor facts, Relink employs a unified, query-aware evaluation strategy that jointly considers candidates from both the KG and latent relations, selecting those most useful for answering the query rather than relying on their pre-existence. This empowers Relink to actively discard distractor facts and construct the most faithful and precise evidence path for each query.
Extensive experiments on five ODQA benchmarks show that Relink achieves significant average improvements of 5.4\% in EM and 5.2\% in F1 over leading GraphRAG baselines, demonstrating the superiority of our proposed framework. 
The code is available at https://github.com/DMiC-Lab-HFUT/Relink.

Relink: Constructing Query-Driven Evidence Graph On-the-Fly for GraphRAG

Vision-based 3D Semantic Scene Completion (SSC) has received growing attention due to its potential in autonomous driving. While most existing approaches follow an ego-centric paradigm by aggregating and diffusing features over the entire scene, they often overlook fine-grained object-level details, leading to semantic and geometric ambiguities, especially in complex environments. To address this limitation, we propose Ocean, an object-centric prediction framework that decomposes the scene into individual object instances to enable more accurate semantic occupancy prediction. 
Specifically, we first employ a lightweight segmentation model, MobileSAM, to extract instance masks from the input image. Then, we introduce a 3D Semantic Group Attention module that leverages linear attention to aggregate object-centric features in 3D space. To handle segmentation errors and missing instances, we further design a Global Similarity-Guided Attention module that leverages segmentation features for global interaction. Finally, we propose an Instance-aware Local Diffusion module that improves instance features through a generative process and subsequently refines the scene representation in the BEV space.
Extensive experiments on the SemanticKITTI and SSCBench-KITTI360 benchmarks demonstrate that Ocean achieves state-of-the-art performance, with mIoU scores of 17.40 and 20.28, respectively.

Towards 3D Object-Centric Feature Learning for Semantic Scene Completion

Low-Rank Adaptation (LoRA) enables efficient fine-tuning of large language models but suffers from catastrophic forgetting when learned updates interfere with the dominant singular directions that encode essential pre-trained knowledge. We propose Orthogonal Projection LoRA (OPLoRA), a theoretically grounded approach that prevents this interference through double-sided orthogonal projections. By decomposing frozen weights via SVD, OPLoRA constrains LoRA updates to lie entirely within the orthogonal complement of the top-k singular subspace using projections PL = I − Uk Ukᵀ and PR = I − Vk Vkᵀ. We prove that this construction exactly preserves the top-k singular triples, providing mathematical guarantees for knowledge retention. To quantify subspace interference, we introduce ρk, a metric measuring update alignment with dominant directions. Extensive experiments across commonsense reasoning, mathematics, and code generation demonstrate that OPLoRA significantly reduces forgetting while maintaining competitive task-specific performance on LLaMA-2 7B and Qwen2.5 7B, establishing orthogonal projection as an effective mechanism for knowledge preservation in parameter-efficient fine-tuning.

OPLoRA: Orthogonal Projection LoRA Prevents Catastrophic Forgetting During Parameter-Efficient Fine-Tuning

Free-Viewpoint Video (FVV) enables immersive 3D experiences, but efficient compression of dynamic 3D representation remains a major challenge. Existing dynamic 3D Gaussian Splatting methods couple reconstruction with optimization-dependent compression and customized motion formats, limiting generalization and standardization. To address this, we propose D-FCGS, a novel Feedforward Compression framework for Dynamic Gaussian Splatting. Key innovations include: (1) a standardized Group-of-Frames (GoF) structure with I-P coding, leveraging sparse control points to extract inter-frame motion tensors; (2) a dual prior-aware entropy model that fuses hyperprior and spatial-temporal priors for accurate rate estimation; (3) a control-point-guided motion compensation mechanism and refinement network to enhance view-consistent fidelity. Trained on Gaussian frames derived from multi-view videos, D-FCGS generalizes across diverse scenes in a zero-shot fashion. Experiments show that it matches the rate-distortion performance of optimization-based methods, achieving over 40 times compression compared to the baseline while preserving visual quality across viewpoints. This work advances feedforward compression of dynamic 3DGS, facilitating scalable FVV transmission and storage for immersive applications.

D-FCGS: Feedforward Compression of Dynamic Gaussian Splatting for Free-Viewpoint Videos

Multimodal pretraining has revolutionized visual understanding, but its impact on video-based person re-identification (ReID) remains underexplored. Existing approaches often rely on video-text pairs, yet suffer from two fundamental limitations: (1) lack of genuine multimodal pretraining, and (2) text poorly captures fine-grained temporal motion—an essential cue for distinguishing identities in video.
In this work, we take a bold departure from text-based paradigms by introducing the first skeleton-driven pretraining framework for ReID.
To achieve this, we propose Contrastive Skeleton-Image Pretraining for ReID (CSIP-ReID), a novel two-stage method that leverages skeleton sequences as a spatiotemporally informative modality aligned with video frames. In the first stage, we employ contrastive learning to align skeleton and visual features at sequence level. In the second stage, we introduce a dynamic Prototype Fusion Updater (PFU) to refine multimodal identity prototypes, fusing motion and appearance cues. Moreover, we propose a Skeleton Guided Temporal Modeling (SGTM) module that distills temporal cues from skeleton data and integrates them into visual features. Extensive experiments demonstrate that CSIP-ReID achieves new state-of-the-art results on standard video ReID benchmarks (MARS, LS-VID, iLIDS-VID). 
Moreover, it exhibits strong generalization to skeleton-only ReID tasks (BIWI, IAS), significantly outperforming previous methods.
CSIP-ReID pioneers an annotation-free and motion-aware pretraining paradigm for ReID, opening a new frontier in multimodal representation learning.

Skeletons Speak Louder than Text: A Motion-Aware Pretraining Paradigm for Video-Based Person Re-Identification

Despite being trained on balanced datasets, existing deepfake detectors often exhibit systematic bias at test time, frequently misclassifying fake images as real. We hypothesize that this behavior stems from distributional shifts in fake samples and implicit priors learned during training. Specifically, models tend to overfit to superficial artifacts that do not generalize well across different generation methods, leading to a misaligned decision threshold when faced with test-time distribution shifts. To address this, we propose a theoretically grounded post-hoc calibration framework based on Bayesian decision theory. Specifically, we introduce a learnable scalar correction to the model’s logits, optimized on a small validation set from the target distribution while keeping the backbone frozen. This parametric adjustment compensates for distributional shifts in model output, realigning the decision boundary without requiring ground-truth labels. Experiments on challenging benchmarks show that our approach significantly improves robustness without retraining, offering a lightweight and principled solution to threshold miscalibration in deepfake detection. Our code will be released.

Your AI-Generated Image Detector Can Secretly Achieve SOTA Accuracy, If Calibrated

Vision-Language Models (VLMs) have achieved impressive performance across various tasks, but often struggle to apply newly introduced visual concepts during inference. A common failure pattern is what we call Mixing Things Up: VLMs frequently confuse concept names, resulting in vague descriptions and failure to ground the concept correctly. Existing approaches mainly address person-related concepts through text prompts or tokenizer modifications. However, VLMs still miss or misinterpret untrained visual concepts, underscoring the need to learn new concepts directly from visual input, without relying on prior textual injection. To overcome these limitations, we propose BISCUIT (Basis-aligned Inference through Structured Concept Unification and Identification-aware Tuning), a two-step training method. Step I proposes a dual-stream structure-aware vision encoder that fuses RGB and edge-based embeddings within a shared basis space to enhance concept recognition. Step II enhances generation quality through identification-aware tuning, which encourages alignment between the generated text and the newly introduced visual concepts. Existing methods mainly focus on person concepts and lack comprehensive evaluation across diverse visual categories. We further propose a benchmark BiscuitVQA to evaluate VLMs performance on recognizing and applying novel image-introduced concepts across diverse concept types and task types, including real people, cartoons, animals, and symbolic content. We apply BISCUIT to LLaVA-1.5 and Qwen2.5-VL, achieving competitive results among open-source models and narrowing the gap to Gemini-2.5 and GPT-4o. 
Interestingly, our BISCUIT maintains strong generalization, showing minimal degradation on other downstream tasks.

Stop Mixing Things Up! BISCUIT Teaches Vision-Language Models to Learn New Concepts from Images on the Spot

The Schatten-p norm, as a class of structure-inducing norms based on singular values, has been widely used to enhance model low-rankness and representation capability due to its flexibility in structural modeling and favorable mathematical properties. However, its potential in cluster distribution modeling has long been overlooked. Therefore, we explore the potential of maximizing the Schatten-p norm as a regularization strategy specifically designed to achieve balanced clustering. This work is the first to investigate its effectiveness in promoting cluster balance. To be specific, maximizing Schatten-p norm effectively guides the assignment of data points, ensuring a more balanced distribution of samples across clusters. We have conducted an in-depth theoretical analysis and validated its effectiveness through extensive clustering experiments. Experimental results demonstrate that, compared to existing methods, this regularization term significantly improves clustering quality and obtain reasonable clustering.

Maximizing Schatten-p Norm Regularization Toward Balance

Transferring 2D textures onto complex 3D scenes plays a vital role in enhancing the efficiency and controllability of 3D multimedia content creation. However, existing 3D style transfer methods primarily focus on transferring abstract artistic styles to 3D scenes. These methods often overlook the geometric information of the scene, which makes it challenging to achieve high-quality 3D texture transfer results. 
In this paper, we present GT$^2$-GS, a geometry-aware texture transfer framework for gaussian splitting. 
First, we propose a geometry-aware texture transfer loss that enables view-consistent texture transfer by leveraging prior view-dependent feature information and texture features augmented with additional geometric parameters. Moreover, an adaptive fine-grained control module is proposed to address the degradation of scene information caused by low-granularity texture features. Finally, a geometry preservation branch is introduced. This branch refines the geometric parameters using additionally bound Gaussian color priors, thereby decoupling the optimization objectives of appearance and geometry.
Extensive experiments demonstrate the effectiveness and controllability of our method. Through geometric awareness, our approach achieves texture transfer results that better align with human visual perception.

GT2-GS: Geometry-aware Texture Transfer for Gaussian Splatting

Sketch-based solutions are widely used to estimate item frequencies in infinite data streams. Traditional hand-crafted sketches face the bottleneck of further eliminating errors because they cannot fully utilize the data stream distribution. Although recent neural sketches represented by MetaSketch and LegoSketch have improved generalization capabilities, they face bottlenecks such as high computational overhead and parameter sensitivity. Meanwhile, they ignore load information, fail to fully utilize the local information in hand-crafted sketches, and do not focus on the frequent items that are usually more important in data streams. In this paper, we propose RatioSketch, a novel lightweight neural network correction framework that synergizes the advantages of hand-crafted sketches and neural sketches in a "micro-correction'' paradigm. The key idea is to retain the efficient underlying data structure of the hand-crafted sketch and to build a neural correction layer in its output space. We select multiple representative hand-crafted sketches as use cases to study the correction performance of RatioSketch on them. Extensive experimental evaluations on several datasets show that RatioSketch-corrected sketches achieve significantly better accuracy than their uncorrected versions, as well as those of MetaSketch and LegoSketch.

Downloads

Next from AAAI 2026

Relink: Constructing Query-Driven Evidence Graph On-the-Fly for GraphRAG

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Relink: Constructing Query-Driven Evidence Graph On-the-Fly for GraphRAG

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads