Singapore

Recent advances in large language models (LLMs) have driven impressive progress in omni-modal understanding and generation. However, training omni-modal LLMs remains a significant challenge due to the heterogeneous model architectures required to process diverse modalities, necessitating sophisticated system design for efficient large-scale training. Existing frameworks typically entangle model definition with parallel logic, incurring limited scalability and substantial engineering overhead for end-to-end omni-modal training. We present OmniScale, a modular and efficient training framework to accelerate the development of omni-modal LLMs. OmniScale introduces model-centric distributed recipes that decouples communication from computation, enabling efficient 3D parallelism on omni-modal LLMs. OmniScale also features a flexible configuration interface supporting seamless integration of new modalities with minimal code change. Using OmniScale, a omni-modal mixture-of-experts (MoE) model with 30B parameters can be trained with over 2,800 tokens/sec/GPU throughput and scale to 160K context lengths via 3D parallelism on 128 GPUs, showcasing its superior efficiency and scalability for training large omni-modal LLMs.

AAAI 2026

OmniScale: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

omnimodel

mlsys

llms

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Anomaly detection on data streams presents significant challenges, requiring methods to maintain high detection accuracy among evolving distributions while ensuring real-time efficiency. Here we introduce $\mathcal{IDK}$-$\mathcal{S}$, a novel $\mathbf{I}$ncremental $\mathbf{D}$istributional $\mathbf{K}$ernel for $\mathbf{S}$treaming anomaly detection that effectively addresses these challenges by creating a new dynamic representation in the kernel mean embedding framework. The superiority of $\mathcal{IDK}$-$\mathcal{S}$ is attributed to two key innovations. First, it inherits the strengths of the Isolation Distributional Kernel, an offline detector that has demonstrated significant performance advantages over foundational methods like Isolation Forest and Local Outlier Factor due to the use of a data-dependent kernel. Second, it adopts a lightweight incremental update mechanism that significantly reduces computational overhead compared to the naive baseline strategy of performing a full model retraining, which is achieved without compromising detection accuracy, a claim supported by its statistical equivalence to the full retrain model. Our extensive experiments on thirteen benchmarks demonstrate that $\mathcal{IDK}$-$\mathcal{S}$ achieves superior detection accuracy while operating substantially faster, in many cases by an order of magnitude, than existing state-of-the-art methods.

IDK-S: Incremental Distributional Kernel for Streaming Anomaly Detection

Recent brain decoding studies have primarily emphasized the development of brain decoders, while largely neglecting the segmentation step. Existing methods typically adopt fixed-length segmentation, which might overlook subject- or task-level variability and disrupt intrinsic neural structures within brain signals. To address this gap, we propose $\textbf{S}^\textbf{3}$, which leverages spiking neurons as an isolating segmenter for brain signal decoding. $\textbf{S}^\textbf{3}$ segments brain signals adaptively, considering subject- and task-level variability while preserving intrinsic neural structures in brain signals. It exploits the unique reset mechanism of spiking neurons to enforce temporal pattern isolation for the generation of each segmentation point. To optimize $\textbf{S}^\textbf{3}$ for enhancing task performance in the absence of segmentation labels, we develop an optimization method where pseudo-labels are created with a stochastic-greedy algorithm to optimize them, circumventing gradient blockade between them. Experiments on 10 downstream tasks across 13 public datasets demonstrate that $\textbf{S}^\textbf{3}$ consistently outperforms existing methods, validating its effectiveness, generalizability and interpretability.

S³: Spiking Neurons as an Isolating Segmenter for Brain Signal Decoding

Multi-modal object re-identification (ReID) aims to retrieve specific targets by leveraging complementary cues from different sensing modalities. Despite recent progress, two key challenges remain:
(1) the limited ability to jointly address both modality and viewpoint discrepancies, and
(2) the difficulty of effectively leveraging reliable target-domain data to improve generalization.
To address these challenges, we propose Proxy-driven Test-Time Training (ProxyTTT), a unified framework that enhances both multi-modal identity representation learning and model generalization. During training, we propose a Multi-Proxy Learning (MPL) mechanism to address the representation bias across different views and modalities. MPL disentangles fine-grained modality-specific and modality-common identity proxies as semantic anchors to align identity features across diverse perspectives and sensing modalities. This alignment strategy enables the model to learn robust and discriminative global identity representations under heterogeneous modality conditions.
At test time, to reliably exploit target domain data, we propose Proxy-guided Entropy-based Selective Adaptation (PESA) for test-time training. Specifically, PESA leverages the semantic structure encoded by identity proxies to estimate prediction uncertainty via entropy, and selectively adapts the model using only high-confidence samples. This selective adaptation effectively mitigates the domain shift between training and deployment environments, improving the model’s generalization in real-world scenarios.
Extensive experiments on four public multi-modal ReID benchmarks (RGBNT201, RGBNT100, MSVR310, and WMVeID863) demonstrate the effectiveness of ProxyTTT.

ProxyTTT: Proxy-driven Test-Time Training for Multi-modal Re-identification

Implicit neural representations (INRs) have achieved remarkable success in image representation and compression, but they require substantial training time and memory. Meanwhile, recent 2D Gaussian Splatting (GS) methods (e.g., GaussianImage) offer promising alternatives through efficient primitive-based rendering. However, these methods require excessive Gaussian primitives to maintain high visual fidelity. To exploit the potential of GS-based approaches, we present GaussianImage++, which utilizes limited Gaussian primitives to achieve impressive representation and compression performance. Firstly, we introduce a distortion-driven densification mechanism. It progressively allocates the allowance of Gaussian primitives according to signal intensity. Secondly, we employ context-aware Gaussian filters for each primitive, which assist in the densification to optimize Gaussian primitives based on varying image content. Thirdly, we integrate attribute-separated learnable scalar quantizers and quantization-aware training, enabling efficient compression of primitive attributes. Experimental results demonstrate the effectiveness of our method. Particularly, GaussianImage++ outperforms GaussianImage and INRs-based COIN in representation and compression performance while maintaining real-time decoding and low memory usage. Our codes will be released soon.

GaussianImage++: Boosted Image Representation and Compression with 2D Gaussian Splatting

We introduce a new notion of deterministic stable solution for non-cooperative games, termed subsidized equilibrium. It assumes that an amount of money can be used as a pool of subsidies to stabilize a strategy profile that otherwise would not be accepted by (some of) the players. Roughly speaking, for a given amount of money, a strategy profile is a subsidized equilibrium if the total payoff loss incurred by players not playing best-responses does not exceed that amount, i.e., there is enough money to refund all players experiencing a regret. With respect to many other solution concepts in the literature, the notion of subsidized equilibrium has important advantages. Specifically, for a sufficiently high value of money, a subsidized equilibrium always exists and can even be computed in polynomial time; also, existence of an efficient subsidized equilibrium can be guaranteed. Thus, determining for which amounts of money existence, polynomial time computability and efficiency can or cannot be achieved becomes an intriguing question. We provide initial results towards this direction for some widely studied classes of games.

Compensate to Not Deviate: On Subsidised Equilibria

Assessing the strength of arguments is essential for determining the outcomes of any argument-based system. A wide range of semantics has been proposed in the literature. These take as input a set of arguments—each assigned a basic weight and potentially subject to attacks from others—and compute a single strength value for each argument. Despite the diversity of argument types (or schemes), existing semantics apply uniform evaluation criteria across all arguments. In this paper, we advocate for type-dependent evaluations, acknowledging that the impact of attacks can vary across types. Given that many argument-based systems involve heterogeneous types of arguments, we propose a broad family of hybrid semantics that combine distinct base semantics, each tailored to specific argument types. We investigate their theoretical properties, present concrete instances within this family, and examine their computational complexity.

Hybrid Semantics Accounting for Argument Types

First-order relational languages have been used in MDP planning and reinforcement learning (RL) for two main purposes: specifying MDPs in compact form, and representing and learning policies that are general and not tied to specific instances or state spaces. In this work, we instead consider the use of first-order languages in goal-conditioned RL and generalized planning. The question is how to learn goal-conditioned and general policies when the training instances are large and the goal cannot be reached by random exploration alone. The technique of Hindsight Experience Replay (HER) provides an answer to this question: it relabels unsuccessful trajectories as successful ones by replacing the original goal with one that was actually achieved. If the target policy must generalize across states and goals, trajectories that do not reach the original goal states can enable more data- and time-efficient learning. In this work, we show that further performance gains can be achieved when states and goals are represented by sets of atoms. We consider three versions: goals as full states, goals as subsets of the original goals, and goals as lifted versions of these subgoals. The result is that the latter two successfully learn general policies on large planning instances with sparse rewards by automatically creating a curriculum of easier goals of increasing complexity. The experiments illustrate the computational gains of these versions, their limitations, and opportunities for addressing them.

First-Order Representation Languages for Goal-Conditioned RL

Diffusion policies excel at robotic manipulation by naturally modeling multimodal action distributions in high-dimensional spaces. Nevertheless, diffusion policies suffer from diffusion representation collapse: semantically similar observations are mapped to indistinguishable features, ultimately impairing their ability to handle subtle but critical variations required for complex robotic manipulation. To address this problem, we propose D²PPO (Diffusion Policy Policy Optimization with Dispersive Loss). D²PPO introduces dispersive loss regularization that combats representation collapse by treating all hidden representations within each batch as negative pairs. D²PPO compels the network to learn discriminative representations of similar observations, thereby enabling the policy to identify subtle yet crucial differences necessary for precise manipulation. In evaluation, we find that early-layer regularization benefits simple tasks, while late-layer regularization sharply enhances performance on complex manipulation tasks. On RoboMimic benchmarks, D²PPO achieves an average improvement of 22.7% in pre-training and 26.1% after fine-tuning, setting new SOTA results. In comparison with SOTA, the results of real-world experiments on a Franka Emika Panda robot show the excitingly high success rate of our method. The superiority of our method is especially evident in complex tasks.

D²PPO: Diffusion Policy Policy Optimization with Dispersive Loss

We address the problem of energy-optimal pathfinding for electric vehicles (EVs) in large-scale road networks, where energy may be recuperated along paths, introducing negative costs. While traditional routing algorithms assume a known initial energy level, many real-world scenarios require computing optimal paths for all possible initial energy levels, a task known as energy profile search. Existing solutions often rely on complex and computationally demanding profile merging procedures.
In this paper, we propose a novel A*-based energy profile search algorithm that avoids explicit profile merging by applying relaxed dominance rules within a multi-objective search framework. We present four variants of our method and evaluate them on road networks enriched with realistic energy consumption data. Experimental results show that our energy profile A* search performs comparably to conventional energy-optimal A*, which guarantees polynomial-time complexity, while additionally supporting profile queries through a simpler yet efficient solution for large-scale EV routing.

A Fast Heuristic Search Approach for Energy-Optimal Profile Routing for Electric Vehicles

Large Language Models (LLMs) have demonstrated remarkable proficiency in diverse tasks. This success raises a fundamental question in machine composition: Can symbolic music be considered a special form of language that can be jointly modeled with natural language for composition tasks? Recent studies validate that symbolic music can be modeled as a human language, yet composing structured music from partial symbolic inputs through natural language interaction remains underexplored. Even LLMs struggle to generate structurally coherent compositions in such hybrid input-output scenarios, highlighting a fundamental gap that calls for a domain-specific learning paradigm. To this end, we propose Inspiration-to-Structure (IoS), a cognitively inspired framework that enables LLMs to generate structured musical sections from melodic ideas. IoS employs a three-phase process—semantic, structural, and collaborative cognition—and is supported by two key components: (1) a new dataset and construction protocol called Structured Triplet Data (STD), and (2) a training method, Dual-Instance Structural Contrastive Optimization (DiSCO), designed to enhance structural awareness. Experiments show that IoS improves structural coherence by 47.8% and artistic creativity by 21.8% compared to conventional language modeling paradigm, supervised fine-tuning, and even enables smaller LLMs to surpass larger LLMs. These results suggest that symbolic music, while language-like, demands specialized modeling beyond standard language modeling paradigms. IoS enables LLMs to transform music theory knowledge into structured composition, empowering users to compose music interactively via language and advancing toward general creative AI.

Downloads

Next from AAAI 2026

IDK-S: Incremental Distributional Kernel for Streaming Anomaly Detection

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

IDK-S: Incremental Distributional Kernel for Streaming Anomaly Detection

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads