Singapore

Cross-Domain Few-Shot Object Detection (CD-FSOD) is an extremely challenging task due to the inherent data scarcity and substantial domain shift between the source and target domains. Existing methods often suffer from overfitting and noisy feature representations, which hinder the construction of discriminative class prototypes in the target domain. In this paper, we propose a novel framework with sparse instance learning (SI-ViTO) for CD-FSOD, which leverages instance sparsity to achieve a better detection with less representation. SI-ViTO adopts a dual-stage sparsity module, consisting of instance feature sparsity not only on the few-shot support images but also on the query images. This dual sparsity enables the model to effectively preserve salient foreground semantics and simultaneously to filter out redundant or noisy information. Furthermore, a new prototype calibration strategy is also used to dynamically refine the class prototypes with query instances to accelerate prototype adaptation. Extensive experimental results on CD-FSOD benchmarks show that SI-ViTO outperforms the state-of-the-art methods, demonstrating that less discriminative representations yield better cross-domain few-shot object detection performance than more abundant ones.

AAAI 2026

Less Is Better: Sparse Instance Learning for Cross-Domain Few-Shot Object Detection

cv: representation learning for vision

cv: learning & optimization for cv

cv: object detection & categorization

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Graph Neural Networks (GNNs) are expressive architectures for learning from complex graph-structured data. However, their practical use is often limited by the high computational cost of neighborhood aggregation. Recent efforts have focused on knowledge distillation from GNNs to inference-efficient Multi-Layer Perceptrons (MLPs). However, most existing works treat this distillation as an embedding alignment problem, overlooking the need to replicate the topology-aware smoothing behavior that arises from message passing in GNNs. Moreover, existing methods are primarily performance driven, ignoring critical real-world requirements such as fairness. In this work, we make two key observations: $\textit{(1)}$ state-of-the-art distillation methods fail to capture the heterogeneous smoothness patterns of GNNs, limiting structural awareness in MLPs, and $\textit{(2)}$ they introduce significant individual and group fairness violations. We introduce $\texttt{FAITH}$, the first $\textit{fair and structurally aware GNN-to-MLP distillation framework with graph-free inference.}$ To improve structural awareness in MLPs, we propose a neighborhood-guided energy alignment objective that transfers not only node-level energy, but also the distribution of energies across local neighborhoods. To improve individual fairness, $\texttt{FAITH}$ introduces a novel $\ell_{2,1}$-norm objective that preserves structured similarity in the learned representations. Additionally, we incorporate a counterfactual invariance objective that explicitly encourages the model to learn representations that are statistically independent of the sensitive attribute. We provide a comprehensive theoretical analysis of $\texttt{FAITH}$, interpreting it through a novel instantiation of the Information Bottleneck principle. Extensive experiments on 11 benchmark datasets show that $\texttt{FAITH}$ achieves stronger structural awareness and delivers a better trade-off between utility and fairness than existing methods.

Leap of FAITH from GNN-to-MLP: Fairness Aware Inference via DisTillation of GrapH Knowledge

The sparsity of user–item interactions remains a fundamental obstacle in collaborative filtering, limiting the ability of Graph Neural Network (GNN)-based recommender systems to capture high-order user relationships without incurring over-smoothing and computational overhead. Existing social recommendation approaches mitigate this by incorporating social networks, yet most rely on explicit ties and fail to construct informative links in their absence. Meanwhile, contrastive learning (CL) has shown promise in improving representation quality, but current view generation strategies, augmentation-based for robustness and nonaugmentation-based for semantic fidelity, are seldom combined, leaving their complementary potential underexplored. We propose Social Generating with Multiview-guided Tuning (SGMT), a unified framework that addresses both challenges. First, an interest-aware social generation mechanism constructs synthetic user–user links from shared interaction patterns, theoretically shown to compress collaborative paths and uncover latent high-order relations. Second, we present two complementary CL modules, Noise-augmented View and Semantic-explored View, which we theoretically prove to preferentially enhance uniformity and alignment, respectively, two fundamental objectives in CL. Experiments on three real-world datasets show that SGMT outperforms state-of-the-art baselines, validating both the theoretical analysis and the practical efficacy of our model.

SGMT: Social Generating with Multiview-Guided Tuning In Recommender Systems

Large language models (LLMs) increasingly support multilingual understanding and generation. Meanwhile, efforts to interpret their internal mechanisms have emerged, offering insights to enhance multilingual performance. While multi-head self-attention (MHA) has proven critical in many areas, its role in multilingual capabilities remains underexplored. In this work, we study the contribution of MHA in supporting multilingual processing in LLMs. We propose Language Attention Head Importance Scores (LAHIS), an effective and efficient method that identifies attention head importance for multilingual capabilities via a single forward and backward pass through the LLM. Applying LAHIS to Aya-23-8B, Llama-3.2-3B, and Mistral-7B-v0.1, we reveal the existence of both language-specific and language-general heads. Language-specific heads enable cross-lingual attention transfer to guide the model toward target language contexts and mitigate off-target language generation issue, contributing to addressing challenges in multilingual LLMs. We also introduce a lightweight adaptation that learns a soft head mask to modulate attention outputs over language heads, requiring only 20 tunable parameters to improve XQuAD accuracy. Overall, our work enhances both the interpretability and multilingual capabilities of LLMs from the perspective of MHA.

Focusing on Language: Revealing and Exploiting Language Attention Heads in Multilingual Large Language Models

Strategyproofness has been the holy grail in mechanism design for decades, providing strong incentive compatibility guarantees under the assumption of perfectly rational agents. However, this assumption is questionable when agents exhibit bounded rationality. Moreover, strategyproofness often imposes strong impossibility results that prevent mechanisms from surpassing certain approximation barriers. We study this tension in budget-feasible mechanism design, where a designer wants to procure services of maximum value from agents subject to a budget constraint. Here, strategyproofness imposes approximation barriers of $1+\sqrt{2}$ and $2$ for deterministic and randomized mechanisms, respectively.

We investigate how much we can potentially gain under bounded rationality. We adopt the weaker notion of \emph{not obviously manipulable (NOM)}, which only prevents "obvious" strategic deviations. We fully resolve the achievable approximation guarantees under NOM: We derive a deterministic $2$-approximate NOM mechanism under the general class of monotone subadditive valuations.
We also show that this bound is tight (even for additive valuations). Additionally, we provide a simple randomized $(1+\varepsilon)$-approximate NOM mechanism for any $\varepsilon > 0$. These results demonstrate a clear separation between strategyproof and NOM mechanisms. Our mechanisms use \emph{Golden Tickets} and \emph{Wooden Spoons} as natural design primitives, arising from our characterization of NOM mechanisms.

Breaking Barriers, Finding Boundaries: Not Obviously Manipulable Budget-Feasible Mechanism Design

Generating accurate multilingual text with diffusion models has long been desired but remains challenging. Recent methods have made progress in rendering text in a single language, but rendering arbitrary languages is still an under-explored area. This paper introduces EasyText, a text rendering framework based on DiT (Diffusion Transformer), which connects denoising latents with multilingual character tokens encoded as character tokens. We propose character positioning encoding and position encoding interpolation techniques to achieve controllable and precise text rendering. Additionally, we construct a large-scale synthetic text image dataset with 1 million multilingual image-text annotations as well as a high-quality dataset of 20K annotated images, which are used for pretraining and fine-tuning respectively. Extensive experiments and evaluations demonstrate the effectiveness and advancement of our approach in multilingual text rendering, visual quality, and layout-aware text integration.

EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering

Given an undirected graph and a size parameter $k$, the Densest $k$-Subgraph (D$k$S) problem extracts the subgraph on $k$ vertices with the largest number of induced edges. While D$k$S is NP--hard and difficult to approximate, penalty-based continuous relaxations of the problem have recently enjoyed practical success for real-world instances of D$k$S. In this work, we propose a scalable and exact continuous penalization approach for D$k$S using the error bound principle, which enables the design of suitable penalty functions. Notably, we develop new theoretical guarantees ensuring that both the global and local optima of the penalized problem match those of the original problem. The proposed penalized reformulation enables the use of first-order continuous optimization methods. In particular, we develop a non-convex proximal gradient algorithm, where the non-convex proximal operator can be computed in closed form, resulting in low per-iteration complexity. We also provide convergence analysis of the algorithm. Experiments on large-scale instances of the D$k$S problem and one of its variants, the Densest ($k_1, k_2$) Bipartite Subgraph (D$k_1k_2$BS) problem, demonstrate that our method achieves a favorable balance between computation cost and solution quality.

A Scalable and Exact Relaxation for Densest k-Subgraph via Error Bounds

Recent advances in large language models (LLMs) have greatly improved their reasoning and decision-making abilities when deployed as agents. Richer reasoning, however, often comes at the cost of longer chain of thought (CoT), hampering interaction efficiency in real-world scenarios. Nevertheless, there still lacks systematic definition of LLM‑Agent efficiency, hindering targeted improvements. 
To this end, we introduce dual‑efficiency, comprising 
(i) step-level efficiency, which minimizes tokens per step, and (ii) trajectory-level efficiency, which minimizes the number of steps to complete a task. 
Building on this definition, we propose **DEPO**, a dual-efficiency preference‑based optimization method that jointly rewards succinct responses and fewer action steps. Experiments on WebShop and BabyAI show that **DEPO** cuts token usage by up to 60.9\% and steps by up to 26.9\%, while achieving up to a 29.3\% improvement in task performance. **DEPO** also generalizes to three out-of-domain math benchmarks and retains its efficiency gains when trained on only 25\% of the data. The code is available in Appendix.

DEPO: Dual-Efficiency Preference Optimization for LLM Agents

Bipartite learning is a machine learning task aimed at predicting interactions among pairs of instances. It has been applied to a variety of domains, including drug-target interaction, RNA-disease association and regulatory network inference. Despite widely investigated, current methods still present drawbacks, as they are often designed for a specific application and thus do not generalize to other problems, or present scalability issues. To address these challenges, we propose Oxytrees: proxy-based biclustering model trees. Oxytrees compress the interaction matrix into row- and column-wise proxy matrices to significantly reduce training time without impacting predictive performance. We also propose a new leaf-assignment algorithm that significantly reduces the time taken for prediction. Finally, Oxytrees employ linear models using the Kronecker product kernel in their leaves, resulting in shallower trees and thus even faster training. Using 15 datasets, we compared the predictive performance of ensembles of Oxytrees against the current state-of-the-art. We achieve up to 30-fold improvement in training times against the state-of-the-art biclustering forests, while showing competitive or superior performance in most evaluation settings, especially in the inductive setting. Finally, we provide an intuitive Python API to access all datasets, methods and evaluation measures used in this work, thus enabling reproducible research in this field.

Oxytrees: Model Trees for Bipartite Learning

Graph Neural Networks (GNNs) have effectively improved the performance of Cognitive Diagnosis Models (CDMs). Existing works have proposed a series of Graph-based Cognitive Diagnosis Frameworks (GCDFs) to enhance robustness to noise. However, these robust designs are often general methods for GNNs and are not designed for cognitive diagnosis, which undermines real cognitive information during the denoising process. Interestingly, a noteworthy phenomenon has been overlooked: even without robustness designs, GCDFs can still learn correct information in noisy environments. In this paper, we conduct a comprehensive empirical analysis of this issue. We found that noise primarily accumulates in lower singular components. Even in noisy environments, the principal subspaces of representations still remain stable. Based on these findings, we propose a Noise-aware Cognitive Diagnostic framework based on Low-rank Alignment, named NCDLA. The framework first performs low-rank reconstruction of the interaction matrix between students and exercises, retaining only larger singular values to achieve noise reduction. Then, the reconstructed interaction matrix and the original interaction matrix are combined with the Q matrix to form a noise-reduced heterogeneous graph and an original heterogeneous graph. In order to distinguish between the interaction patterns of correct and incorrect responses, we decompose the heterogeneous graph according to the type of response. NCDLA achieves denoising of student representations and exercises representations through a self-supervised strategy based on low-rank reconstruction and a spectral anchor regularisation method. Extensive experiments on three datasets demonstrate that NCDLA achieves optimal prediction performance and robustness.

Noise-Aware Graph-Based Cognitive Diagnostic Framework Through Low-Rank Alignment

The text-to-SQL task is an active challenge in Natural Language Processing. Many existing solutions focus on using black-box language models extended with specialized components within customized end-to-end text-to-SQL pipelines. While these solutions use both closed-source proprietary language models and coding-oriented open-source models, there is a lack of research regarding SQL-specific small generative models. At the same time, recent advancements in self-correcting generation strategies show promise for improving the capabilities of existing architectures. The application of these concepts to the text-to-SQL task remains unexplored.
In this paper, we introduce RetrySQL, a new approach to training text-to-SQL generation models. We prepare reasoning steps for reference SQL queries and then corrupt them to create retry data that contains both incorrect and corrected steps, divided with a special token. We continuously pre-train open-source coding models with this data and demonstrate that retry steps yield an improvements of up to 4 and 9 percentage points for overall and challenging execution metrics, respectively, as compared to pre-training without retry data. We showcase that the self-correcting behavior is learned by the model and the increase in downstream accuracy metrics is a result of this additional skill. Finally, we incorporate RetrySQL-trained models into the full text-to-SQL pipeline and showcase that they are competitive in terms of execution accuracy with proprietary models that contain orders of magnitude more parameters.
RetrySQL demonstrates that self-correction can be learned in the text-to-SQL task and provides a novel way of improving generation accuracy for small SQL-oriented language models.

Downloads

Next from AAAI 2026

Leap of FAITH from GNN-to-MLP: Fairness Aware Inference via DisTillation of GrapH Knowledge

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Leap of FAITH from GNN-to-MLP: Fairness Aware Inference via DisTillation of GrapH Knowledge

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads