Singapore

Mixture-of-Experts (MoE) architecture with experts parallelism scales LLMs efficiently by activating only a subset of experts per input, avoiding proportional training costs. However, the intensive and heterogeneous communication substantially hinders the efficiency and scalability of MoE training in the resource-constrained scenario. Existing communication compression techniques fall short in MoE training due to: (\textit{i}) Intensive training amplifies compression overhead, compromising training efficiency; (\textit{ii}) Accumulated compression errors propagate through the network, degrading training quality. In this paper, we propose RCMoE, a communication-efficient \textbf{R}andom \textbf{C}ompression framework for \uline{MoE} training with two core modules:
(\textit{i}) \textit{Local-Stochastic Quantization} compresses the all-to-all communication by stochastically quantizing each row of the expert&#39;s intermediate computing results in parallel, effectively improving the compression efficiency and reducing compression error; 
(\textit{ii}) \textit{Probabilistic Thresholding Sparsification} compresses the all-reduce communication by probabilistically sampling large gradients at high probability, thereby reducing the computational complexity and maintaining the convergence efficiency. 
Experiments on four typical MoE training tasks prove that RCMoE achieves higher 5.9$\times$-8.1$\times$ total communication compression ratios and 1.3$\times$-10.1$\times$ training speedup compared with the state-of-the-art compression techniques while maintaining the MoE training accuracy.

AAAI 2026

RCMoE: A Communication-Efficient Random Compression Framework for Resource-Constrained Mixture-of-Experts Training

compression

communication

Mixture-of-Experts (MoE) architecture with experts parallelism scales LLMs efficiently by activating only a subset of experts per input, avoiding proportional training costs. However, the intensive and heterogeneous communication substantially hinders the efficiency and scalability of MoE training in the resource-constrained scenario. Existing communication compression techniques fall short in MoE training due to: (\textit{i}) Intensive training amplifies compression overhead, compromising training efficiency; (\textit{ii}) Accumulated compression errors propagate through the network, degrading training quality. In this paper, we propose RCMoE, a communication-efficient \textbf{R}andom \textbf{C}ompression framework for \uline{MoE} training with two core modules:
(\textit{i}) \textit{Local-Stochastic Quantization} compresses the all-to-all communication by stochastically quantizing each row of the expert's intermediate computing results in parallel, effectively improving the compression efficiency and reducing compression error; 
(\textit{ii}) \textit{Probabilistic Thresholding Sparsification} compresses the all-reduce communication by probabilistically sampling large gradients at high probability, thereby reducing the computational complexity and maintaining the convergence efficiency. 
Experiments on four typical MoE training tasks prove that RCMoE achieves higher 5.9$\times$-8.1$\times$ total communication compression ratios and 1.3$\times$-10.1$\times$ training speedup compared with the state-of-the-art compression techniques while maintaining the MoE training accuracy.

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Energy usage prediction is important for various real-world applications, including grid management, infrastructure planning, and disaster response. Although a plethora of deep learning approaches have been proposed to perform this task, most of them either overlook the essential spatial correlations across households or fail to scale to individualized prediction, making them less effective for accurate fine-grained user-level prediction. In addition, due to the dynamic and uncertain nature of energy usage caused by various factors such as extreme weather events, quantifying uncertainty for reliable prediction is also significant, but it has not been fully explored in existing work. In this paper, we propose a unified framework called TrustEnergy for accurate and reliable user-level energy usage prediction. There are two key technical components in TrustEnergy, (i) a Hierarchical Spatiotemporal Representation module to efficiently capture both macro and micro energy usage patterns with a novel memory-augmented spatiotemporal graph neural network, and (ii) an innovative Sequential Conformalized Quantile Regression module to dynamically adjust uncertainty bounds to ensure valid prediction intervals over time, without making strong assumptions about the underlying data distribution. We implement and evaluate our TrustEnergy framework by working with an electricity provider in Florida, and the results show our TrustEnergy can achieve a 5.4% increase in prediction accuracy and 5.7% improvement in uncertainty quantification compared to state-of-the-art baselines.

TrustEnergy: A Unified Framework for Accurate and Reliable User-level Energy Usage Prediction

Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-variance return estimates. In this paper we leverage new results from off-policy evaluation; it has recently been shown that well-designed behaviour policies can be used to collect off-policy data for provably lower variance return estimates. This result is surprising as it means collecting data on-policy is not variance optimal. We extend this key insight to the online reinforcement learning setting, where both policy evaluation and improvement are interleaved to learn optimal policies. Off-policy RL has been well studied (e.g., IMPALA), with correct and truncated importance weighted samples for de-biasing and managing variance appropriately. Generally these approaches are concerned with reconciling data collected from multiple workers in parallel, while the policy is updated asynchronously, mismatch between the workers and policy is corrected in a mathematically sound way. Here we consider only one worker - the behaviour policy, which is used to collect data for policy improvement, with provably lower variance return estimates. In our experiments we extend two policy-gradient methods with this regime, demonstrating better sample efficiency and performance over a diverse set of environments.

Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning

Existing large language model (LLM)-based table question answering (TableQA) methods primarily involve decomposition reasoning and answer verification processes. However, decomposing questions solely at the semantic level, without considering the factual evidence in tables, fails to significantly reduce the difficulty for LLMs in understanding the key information in questions. Furthermore, reasoning and verification without supporting factual evidence are often arbitrary and unreliable. In light of these issues, this paper proposes a Syllogism-Inspired Reasoning and Verification method (SIRV), which performs reliable decomposition reasoning and answer verification based on the evidential concept of syllogism. Specifically, SIRV extracts question-relevant factual evidence from the table to construct the premises. Based on the constructed premises, SIRV plans reasoning paths and generates sub-questions that explicitly indicate relevant factual evidence, performing evidence-centered reasoning. Additionally, SIRV examines the consistency between the premises and the table to focus on factual evidence, thereby reliably identifying and correcting errors in the reasoning process. Compared to state-of-the-art methods, SIRV achieves performance improvements of up to 5.24% in single-mode and 2.89% in joint reasoning, while also demonstrating excellent generalization ability and efficiency.

Syllogism-Inspired TableQA: Evidentialization Makes Decomposition Reasoning and Answer Verification More Reliable

Recent advances in explainable recommendation have explored the integration of language models to analyze natural language rationales for user–item interactions.
Despite their potential, existing methods often rely on ID-based representations that obscure semantic meaning and impose structural constraints on language models, thereby limiting their applicability in open-ended scenarios.
These challenges are intensified by the complex nature of real-world interactions, where diverse user intents are entangled and collaborative signals rarely align with linguistic semantics.
To overcome these limitations, we propose BEAT, a unified and transferable framework that tokenizes user and item behaviors into discrete, interpretable sequences. 
We construct a behavior vocabulary via a vector-quantized autoencoding process that disentangles macro-level interests and micro-level intentions from graph-based representations. 
We then introduce multi-level semantic supervision to bridge the gap between behavioral signals and language space.
A semantic alignment regularization mechanism is designed to embed behavior tokens directly into the input space of frozen language models.
Experiments on three public datasets show that BEAT improves zero-shot recommendation performance while generating coherent and informative explanations.
Further analysis demonstrates that our behavior tokens capture fine-grained semantics and offer a plug-and-play interface for integrating complex behavior patterns into large language models.

Behavior Tokens Speak Louder: Disentangled Explainable Recommendation with Behavior Vocabulary

Trajectory similarity computation is fundamental functionality that is used for, e.g., clustering, prediction, and anomaly detection. However, existing learning-based methods exhibit three key limitations: (1) insufficient modeling of trajectory semantics and hierarchy, lacking both movement dynamics extraction and multi-scale structural representation; (2) high computational costs due to point-wise encoding; and (3) use of physically implausible augmentations that distort trajectory semantics. To address these issues, we propose MovSem, a movement-semantics contrastive learning framework for trajectory similarity computation. MovSem first transforms raw GPS trajectories into movement-semantics features and then segments them into patches. Next, MovSem employs intra- and inter-patch attentions to encode local as well as global trajectory patterns, enabling efficient hierarchical representation and reducing computational costs. Moreover, MovSem includes a curvature-guided augmentation strategy that preserves informative segments (e.g., turns and intersections) and masks redundant ones, generating physically plausible augmented views. Experiments on real-world datasets show that MovSem is capable of outperforming state-of-the-art methods, achieving mean ranks close to the ideal value of 1 at similarity search tasks and improvements by up to 20.3% at heuristic approximation, while reducing inference latency by up to 43.4%.

MovSemCL: Movement-Semantics Contrastive Learning for Trajectory Similarity

One important direction of Federated Foundation Models (FedFMs) is leveraging data from small client models to enhance the performance of a large server‑side foundation model. Existing methods based on model level or representation level knowledge transfer either require expensive local training or incur high communication costs and introduce unavoidable privacy risks. We reformulate this problem as a reinforcement learning style evaluation process and propose FedGRPO, a privacy preserving framework comprising two modules. The first module performs competence-based expert selection by building a lightweight confidence graph from auxiliary data to identify the most suitable clients for each question. The second module leverages the “Group Relative” concept from the Group Relative Policy Optimization (GRPO) framework by packaging each question together with its solution rationale into candidate policies, dispatching these policies to a selected subset of expert clients, and aggregating solely the resulting scalar reward signals via a federated group–relative loss function. By exchanging reward values instead of data or model updates, FedGRPO reduces privacy risk and communication overhead while enabling parallel evaluation across heterogeneous devices. Empirical results on diverse domain tasks demonstrate that FedGRPO achieves superior downstream accuracy and communication efficiency compared to conventional FedFMs baselines.

FedGRPO: Privately Optimizing Foundation Models with Group-Relative Rewards from Domain Clients

The increasing popularity of long Text-to-Image (T2I) generation has created an urgent need for automatic and interpretable models that can evaluate the image-text alignment in long prompt scenarios. However, the existing T2I alignment benchmarks predominantly focus on short prompt scenarios and only provide MOS or Likert scale annotations. This inherent limitation hinders the development of long T2I evaluators, particularly in terms of the interpretability of alignment. In this study, we contribute LongT2IBench, which comprises 14K long text-image pairs accompanied by graph-structured human annotations. Given the detail-intensive nature of long prompts, we first design a Generate-Refine-Qualify annotation protocol to convert them into textual graph structures that encompass entities, attributes, and relations. Through this transformation, fine-grained alignment annotations are achieved based on these granular elements. Finally, the graph-structed annotations are converted into alignment scores and interpretations to facilitate the design of T2I evaluation models. Based on LongT2IBench, we further propose LongT2IExpert, a LongT2I evaluator that enables multi-modal large language models (MLLMs) to provide both quantitative scores and structured interpretations through an instruction-tuning process with Hierarchical Alignment Chain-of-Thought (CoT). Extensive experiments and comparisons demonstrate the superiority of the proposed LongT2IExpert in alignment evaluation and interpretation. Data and model will be available.

LongT2IBench: A Benchmark for Evaluating Long Text-to-Image Generation with Graph-structured Annotations

In the literature, existing human-centric emotional motion generation frameworks primarily focus on boosting performance within a single scale-fixed dataset, largely neglecting the flexible and scale-increasing motion scenarios (e.g., sports, dance), whereas effectively learning these newly emerging scenarios can significantly enhance the model’s real-world generalization ability. Inspired by this, this paper proposes a new LLM-Centric Lifelong Empathic Motion Generation (L$^2$-EMG) task, which aims to equip LLMs with the capability to continually acquire emotional motion generation knowledge across different unseen scenarios, potentially contributing to building a closed-loop and self-evolving embodied agent equipped with both empathy and intelligence. Further, this paper poses two key challenges in the L$^2$-EMG task, i.e., the emotion decoupling challenge and the scenario adapting challenge. To this end, this paper proposes an Emotion-Transferable and Scenario-Adapted Mixture of Experts (ES-MoE) approach which designs a causal-guided emotion decoupling block and a scenario-adapted expert constructing block to address the two challenges, respectively. Especially, this paper constructs multiple L$^2$-EMG datasets to validate the effectiveness of the ES-MoE approach. Extensive evaluations demonstrate that ES-MoE outperforms advanced baselines, highlighting the significance of these two challenges in L$^2$-EMG and the effectiveness of ES-MoE in continual emotional motion generation.

Towards Closed-Loop Embodied Empathy Evolution: Probing LLM-Centric Lifelong Empathic Motion Generation in Unseen Scenarios

Industrial anomaly detection is a critical component of modern manufacturing, yet the scarcity of defective samples restricts traditional detection methods to scenario-specific applications. Although Vision-Language Models (VLMs) demonstrate significant advantages in generalization capabilities, their performance in industrial anomaly detection remains limited. To address this challenge, we propose IAD-R1, a universal post-training framework applicable to VLMs of different architectures and parameter scales, which substantially enhances their anomaly detection capabilities. IAD-R1 employs a two-stage training strategy: the Perception Activation Supervised Fine-Tuning (PA-SFT) stage utilizes a meticulously constructed high-quality Chain-of-Thought dataset (Expert-AD) for training, enhancing anomaly perception capabilities and establishing reasoning-to-answer correlations; the Structured Control Group Relative Policy Optimization (SC-GRPO) stage employs carefully designed reward functions to achieve a capability leap from "Anomaly Perception" to "Anomaly Interpretation". Experimental results demonstrate that IAD-R1 achieves significant improvements across 7 VLMs, attaining up to 43.3\% enhancement in average accuracy on 6 industrial anomaly detection benchmark datasets. Notably, the 0.5B parameter model trained with IAD-R1 surpasses commercial models including GPT-4.1 and Claude-Sonnet-4 in zero-shot settings, demonstrating the effectiveness and superiority of IAD-R1. The dataset, code, and all model weights will be open-sourced.

IAD-R1: Reinforcing Consistent Reasoning in Industrial Anomaly Detection

Text-to-music generation technology is progressing rapidly, creating new opportunities for musical composition and editing. However, existing music editing methods often fail to preserve the source music's temporal structure, including melody and rhythm, when altering particular attributes like instrument, genre, and mood. To address this challenge, this paper conducts an in-depth probing analysis on attention maps within AudioLDM 2, a diffusion-based model commonly used as the backbone for existing music editing methods. We reveal a key finding: cross-attention maps encompass details regarding distinct musical characteristics, and interventions on these maps frequently result in ineffective modifications. In contrast, self-attention maps are essential for preserving the temporal structure of the source music during its conversion into the target music. Building upon this understanding, we present Melodia, a training-free technique that selectively manipulates self-attention maps in particular layers during the denoising process and leverages an attention repository to store source music information, achieving accurate modification of musical characteristics while preserving the original structure without requiring textual descriptions of the source music. Additionally, we propose two novel metrics to better evaluate music editing methods. Both objective and subjective experiments demonstrate that our approach achieves superior results in terms of textual adherence and structural integrity across various datasets. This research enhances comprehension of internal mechanisms within music generation models and provides improved control for music creation.

Downloads

Next from AAAI 2026

TrustEnergy: A Unified Framework for Accurate and Reliable User-level Energy Usage Prediction

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

TrustEnergy: A Unified Framework for Accurate and Reliable User-level Energy Usage Prediction

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads