Singapore

Large language models (LLM) have achieved remarkable performance across a wide range of tasks. However, their substantial parameter sizes pose significant challenges for deployment on edge devices with limited computational and memory resources. Low-rank compression is a promising approach to address this issue, as it reduces both computational and memory costs, making LLM more suitable for resource-constrained environments. Nonetheless, naïve low-rank compression methods require a significant reduction in the retained rank to achieve meaningful memory and computation savings. For a low-rank model, the ranks need to be reduced by more than half to yield efficiency gains.
Such aggressive truncation, however, typically results in substantial performance degradation.
To address this trade-off, we propose \textit{SkipCat}, a novel low-rank compression framework that enables the use of higher ranks while achieving the same compression rates. First, we introduce an intra-layer shared low-rank projection method, where multiple matrices that share the same input use a common projection. This reduces redundancy and improves compression efficiency. Second, we propose a block skipping technique that omits computations and memory transfers for selected sub-blocks within the low-rank decomposition. These two techniques jointly enable our compressed model to retain more effective ranks under the same compression budget.
Experimental results show that, \textit{without any additional fine-tuning}, our method outperforms previous low-rank compression approaches by 7\% accuracy improvement on zero-shot tasks under the same compression rate. These results highlight the effectiveness of our rank-maximized compression strategy in preserving model performance under tight resource constraints.

AAAI 2026

SkipCat: Rank-Maximized Low-Rank Compression of Large Language Models via Shared Projection and Block Skipping

efficient ml / green ai

(large) language models

learning on the edge & model compression

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Designing molecules with desired properties, aka the oRiented molEcule Design (RED), is a fundamental task in chemistry and materials science. While graph diffusion models (GDMs) and reinforcement learning techniques (RL) show promise in molecule structure generation and property optimization stages individually, their integration in the unified RED task often suffers from poor compatibility. The large variance among candidate molecular structures generated by GDMs can be amplified in the iterative optimization process of RL, leading to slow and unstable convergence. In this work, motivated by the adaptive and divide-and-conquer characteristics of Mixture of Experts (MoE) architecture, we propose a novel framework called MoE-Guided Graph Diffusion Model (MEGD) that incorporates the MoE architecture to guide the orchestration of GDM and RL, promoting faster and more stable convergence in the design process. MEGD is evaluated on benchmark datasets optimizing the physical and chemical properties of AI-generated molecular structures. On all three datasets, our method outperforms the best of 9 alternative models by 7.73\% on the target structural properties, while not penalizing other important application-level quality metrics of the generated molecules. A real-world case study on an emerging class of material, i.e., metal-organic framework, is also conducted, which further demonstrates the effectiveness of our method in accomplishing the RED task.

MoE-Guided Graph Diffusion for Oriented Molecule Design

We make three novel contributions to parameter learning and inference in probabilistic sentential decision diagrams (PSDDs). First, rather than traversing the entire PSDD during parameter learning for each dataset example, we pioneer the use of determinism to focus only on the activated partition. Second, we demonstrate how to prune deterministic computation in inference, thereby eliminating the need to propagate probability over every node in the network for each query. Third, we introduce a technique that parallelizes a single circuit evaluation, rather than parallelizing individual multiplications or layer-wise inference. For both learning and inference, experimental results on benchmark PSDDs from various application domains demonstrate state-of-the-art performance.

Paths Not Taken: Structure-Based Pruning in PSDD Learning and Inference

Image geo-localization aims to determine the geographic location of a query image. While Multimodal Large Language Models (MLLMs) show potential for this task due to their rich world knowledge and explainable abilities, they often struggle with confirmation bias, i.e., committing to early, potentially incorrect guesses caused by visual clues with varied geographic likelihoods. In this paper, we propose GeoBayes, a novel training-free framework that formulates geolocalization as a Maximum a Posteriori (MAP) estimation task over multiple geographic hypotheses and performs probabilistic thought via sequential Bayesian reasoning. GeoBayes treats each visual object and its associated geographic clues as probabilistic evidence, integrating them iteratively through a Hypothesize–Verify–Update loop. At each step, it evaluates how new evidence supports existing hypotheses and updates their posterior probabilities, gradually converging on the most probable location. This allows GeoBayes to explicitly quantify and fuse the varied geographic probabilities implied by various visual elements, reducing the risk of overcommitting to misleading clues. Furthermore, considering the natural hierarchy of geographic labels (e.g., country, city), GeoBayes introduces a state memory mechanism that stores hypotheses, inference context, and evidence scores across levels. This design enables the framework to propagate prior knowledge across levels of the geographic hierarchy and incorporate geographic structural constraints into the Bayesian update process, achieving a coarse-to-fine geo-localization. Experiments on IM2GPS3k and YFCC4K show that GeoBayes improves MLLM-based geo-localization accuracy without extra training. This demonstrates the effectiveness of probabilistic reasoning for robust and interpretable geo-localization.

GeoBayes: Probabilistic Image Geo-Localization Inference via Sequential Bayesian Updating

This paper presents FAMDR, a Feature-Aligned Multimodal Denoising framework for Reliable Diagnostic Reconciliation. Existing approaches suffer from two major limitations: (1) an overemphasis on simplifying observational descriptions and (2) a failure to denoise the misleading content in radiological findings against clinical histories. Current methods often dismiss such cross-modal inconsistencies as noise rather than clinically significant signals. To bridge this gap, the framework integrates four synergistic components: (1) noise-aware multimodal alignment that preserves discriminative discrepancy features while ensuring semantic coherence, (2) cross-modal retrieval augmentation leveraging external medical knowledge to resolve ambiguous cases, (3) granular localization of noises at pixel and phrase levels using adaptive thresholding, and (4) medical noise uncertainty quantification to provide reliable confidence estimates. Evaluated on an extended MIMIC-CXR dataset enriched with expert-annotated noise and longitudinal records, FAMDR achieves superior accuracy in denoising and inconsistency localization while preserving clinical interpretability. Its capability to generate actionable, uncertainty-aware reports advances safer and more reliable integration into diagnostic workflows.

FAMDR: Feature-Aligned Multimodal Denoising for Reliable Diagnostic Reconciliation in Medical Imaging

Semi-supervised learning (SSL) has demonstrated high performance in image classification tasks by effectively utilizing both labeled and unlabeled data. However, existing SSL methods often suffer from poor calibration, with models yielding overconfident predictions that misrepresent actual prediction likelihoods. Recently, neural networks trained with mixup that linearly interpolates random examples from the training set have shown better calibration in supervised settings. However, calibration of neural models remains under-explored in SSL settings. Although effective in supervised model calibration, random mixup of pseudolabels in SSL presents challenges due to the overconfidence and unreliability of pseudolabels. In this work, we introduce CalibrateMix, a targeted mixup-based approach that aims to improve the calibration of SSL models while maintaining or even improving their classification accuracy. Our method leverages training dynamics of labeled and unlabeled samples to identify ''easy-to-learn'' and ''hard-to-learn'' samples, which in turn are utilized in a targeted mixup of easy and hard samples. Experimental results across several benchmark datasets show that our method achieves lower expected calibration error (ECE) and superior accuracy compared to existing SSL approaches.

On the Calibration of Image Semi-Supervised Learning Models

The Mixture-of-Experts (MoE) architecture has emerged as a key enabler for scaling large language models (LLMs), empowering increased model capacity with minimal computational overhead through gating-based dynamic expert activation. However, due to the memory demands introduced by expert modules, MoE inference on resource-constrained devices is still challenging. Existing methods such as model compression and parameter offloading provide partial alleviation but often lead to reduced accuracy or increased latency. In this paper, we propose CasMoE, a general and efficient cascaded framework for accelerating MoE inference on resource-constrained devices. CasMoE employs a two-stage offline-online approach to facilitate efficient expert prefetching. In the offline stage, a parameterized Expert Activation Predictor (EAP) is introduced to accurately predict the corresponding expert activation from the incoming prompt. In the online stage, a non-parametric Expert Activation Matcher (EAM) supporting fast expert retrieval is then integrated with the EAP to form a cascade planner that operates independently of the MoE architecture, predicting activated experts for all MoE layers in a single pass prior to decoding. A gating mechanism is also incorporated to dynamically adjust the sensitivity of the EAM and EAP, enabling a flexible trade-off between inference efficiency and quality. Extensive experiments on diverse downstream tasks demonstrate CasMoE’s effectiveness in accelerating inference while preserving high accuracy.

CasMoE: A Cascaded Framework for Efficient MoE Inference on Resource-constrained Devices

Long-Tailed Multi-Label Recognition (LTML) is a critical yet challenging task due to two core issues: the severe scarcity of training samples for rare "tail" classes, and the complex co-occurrence patterns among labels that often lead to biased models. To address this, we propose DP-VLPA, a novel Dual-Phase Visual-Language Pretraining and Adaptation framework. In the first phase, our Structured Tail-Aware Generation (STAG) module employs a Large Language Model (LLM) to create detailed descriptions that explicitly emphasize tail classes and their contextual relationships, providing a strong and less-biased feature foundation. In the second adaptation phase, we ensure this knowledge is applied effectively. A Dynamic Query Reweighting (DQR) mechanism forces the model to attend to crucial tail-class evidence. Simultaneously, a Co-occurrence-Aware (COA) loss explicitly teaches the model the statistical dependencies between labels, correcting for co-occurrence biases. Extensive experiments on VOC-LT and COCO-LT datasets demonstrate state-of-the-art performance, achieving mAP scores of 90.72% and 74.42% respectively - surpassing previous best methods by 2.84% and 8.23%. Our code is coming soon.

Dual-Phase Visual-Language Pretraining and Adaptation for Long-Tailed Multi-Label Recognition

Variational autoencoder (VAE)-based frameworks possess a natural advantage in modeling the shared and private information inherent in multimodal data. However, current models focus on improving the quality of shared representations from the reconstruction perspective, lacking explicit mechanisms to model their underlying semantic structure. In this paper, we propose the multimodal Gaussian mixture variational autoencoder with consistency regularizations, which introduces a Gaussian mixture prior over the shared latent space to enhance its semantic structure and encourage the formation of cluster-aware latent representations. To address the cross-modal inconsistency problem under missing modality conditions, we propose a cluster-guided regularization strategy that enforces the cross-modal consistency using the pseudo-category labels from unsupervised clustering. Additionally, we design a self-supervised contrastive regularization strategy to align semantically similar representations across modalities. Extensive experiments on MNIST-SVHN and MNIST-CDCB datasets demonstrate that our method significantly outperforms prior state-of-the-art models in generation, classification, and retrieval tasks.

Multimodal Gaussian Mixture Variational Autoencoder with Consistency Regularizations

Understanding the generalization behavior of in-context learning (ICL) in Transformers remains a fundamental challenge, as most existing theoretical analyses are based on the assumption that data are independently and identically distributed (i.i.d.), an assumption that often does not hold in practice. Motivated by the theoretical insight that ICL operates similarly to gradient-based optimization, we leverage the concept of gradient stability to establish generalization error bounds for ICL without making any distributional assumptions. Our analysis shows that two factors play a central role in ICL generalization: the number of demonstrations in the prompt and their distributional alignment with the query. In particular, increasing the number of demonstrations and improving their alignment with the query distribution lead to better generalization, even without any parameter tuning. Under mild conditions, we further prove that the generalization error can achieve the optimal convergence rate of $O(N^{-\frac{1}{2}})$, where $N$ is the number of demonstrations. Our empirical evaluations validate the effectiveness of our theoretical findings.

Towards Understanding In-Context Learning of Transformers Under Non-I.I.D. Scenarios

Irregularly sampled time series (ISTS), characterized by non-uniform time intervals with natural missingness, are prevalent in real-world applications. Existing approaches for ISTS modeling primarily rely on observed values to impute unobserved ones or infer latent dynamics. However, these methods overlook a critical source of learning signal: the reconstruction error inherently produced during model training. Such error implicitly reflects how well a model captures the underlying data structure and can serve as an informative proxy for unobserved values. To exploit this insight, we propose **iTimER**, a simple yet effective self-supervised pre-training framework for ISTS representation learning. iTimER models the distribution of reconstruction errors over observed values and generates pseudo-observations for unobserved timestamps through a mixup strategy between sampled errors and the last available observations. This transforms unobserved timestamps into noise-aware training targets, enabling meaningful reconstruction signals. A Wasserstein metric aligns reconstruction error distributions between observed and pseudo-observed regions, while a contrastive learning objective enhances the discriminability of learned representations. Extensive experiments on classification, interpolation, and forecasting tasks demonstrate that iTimER consistently outperforms state-of-the-art methods under the ISTS setting.

Downloads

Next from AAAI 2026

MoE-Guided Graph Diffusion for Oriented Molecule Design

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES