Singapore

Mixture-of-Experts (MoE) is a sparse neural architecture that significantly increases model capacity while maintaining low computational complexity. However, deploying MoE-based large language models (LLMs) on memory-constrained edge devices remains challenging due to their substantial memory requirements. To address this issue, we propose FIRM-MoE, a fine-grained expert offloading framework designed to enable flexible and efficient MoE inference. The core insight of our approach is to reduce the risk of inaccurate expert loading by decomposing each expert into fine-grained sub-experts and then dynamically allocating them through a fine-grained scheduling strategy. To further reduce the error in expert loading, we introduce a multi-layer expert prediction mechanism and a resource-adaptive expert pre-loading algorithm to enable more robust expert allocation. This design allows our model to achieve more efficient expert utilization and improved resilience to prediction errors. We conduct extensive experiments to demonstrate the superiority of FIRM-MoE across diverse memory constraints. The results show that FIRM-MoE achieves up to 1.5× speedup and 2.8× memory savings in decoding, compared to state-of-the-art MoE offloading strategies.

AAAI 2026

FIRM-MoE:Fine-GrainedExpert Decomposition for Resource-Adaptive MoE Inference

ml: mixture of experts (moe)

ml: efficient ml / green ai

nlp: (large) language models

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Both long-tailed and noisily labeled data frequently appear in real-world applications and impose significant challenges for learning. Most prior works treat either problem in an isolated way and do not explicitly consider the coupling effects of the two. Our empirical observation reveals that such solutions fail to consistently improve the learning when the dataset is long-tailed with label noise. Moreover, with the presence of label noise, existing methods do not observe universal improvements across different sub-populations; in other words, some sub-populations enjoyed the benefits of improved accuracy at the cost of hurting others. Based on these observations, we introduce the Fairness Regularizer (FR), inspired by regularizing the performance gap between any two sub-populations. We show that the introduced fairness regularizer improves the performances of sub-populations on the tail and the overall learning performance. Extensive experiments demonstrate the effectiveness of the proposed solution when complemented with certain existing popular robust or class-balanced methods.

Robust Learning from Noisily Labeled Long-Tailed Data via Fairness Regularizer

Proof-Number Search is a best-first search algorithm with many successful applications, especially in game solving. As large-scale computing clusters become increasingly accessible, parallelization is a natural way to accelerate computation. However, existing parallel versions of Proof-Number Search are known to scale poorly on many CPU cores. Using two parallelized levels and shared information among workers, we present the first massively parallel version of Proof-Number Search that scales efficiently even on a large number of CPUs. We apply our solver, enhanced with Grundy numbers for reducing game trees, to the Sprouts game, a case study motivated by the long-standing Sprouts Conjecture. Our solver achieves a significantly improved 332.9$\times$ speedup when run on 1024 cores, enabling it to outperform the state-of-the-art Sprouts solver GLOP by four orders of magnitude in runtime and to generate proofs 1,000$\times$ more complex. Despite exponential growth in game tree size, our solver verified the Sprouts Conjecture for 42 new positions, nearly doubling the number of known outcomes.

Massively Parallel Proof-Number Search for Impartial Games and Beyond

Best arm identification (BAI) aims to identify the highest-performance arm among a set of $K$ arms by collecting stochastic samples from each arm.
In real-world problems, the best arm needs to satisfy additional feasibility constraints.
While there is limited prior work on BAI with feasibility constraints, they typically assume the performance and constraints are observed simultaneously on each pull of an arm.
However, this assumption does not reflect most practical use cases, e.g. in drug discovery, we wish to find the most potent drug whose toxicity and solubility are below certain safety thresholds.
These safety experiments can be
conducted separately from the potency measurement.
Thus, this requires designing BAI algorithms which, not only decide which arm to pull
but also decide whether to test for the arm's performance or feasibility.
In this work, we study feasible BAI which allows a
decision-maker to choose a tuple $(i,\ell)$, where $i\in [K]$ denotes an arm and
$\ell$ denotes whether she wishes to test for its performance ($\ell=0$) or any of its $N$
feasibility constraints ($\ell\in[N]$). 
We focus on the fixed confidence setting, which is to identify the
\textit{feasible} arm with the \textit{highest performance}, with a probability of at least
$1-\delta$.
We propose an efficient algorithm and upper-bound its sample complexity,
showing our algorithm can naturally adapt to the problem's difficulty and eliminate arms by worse performance or infeasibility, whichever is easier. 
We complement this upper bound with a lower bound
showing that
our algorithm is \textit{asymptotically ($\delta\rightarrow 0$) optimal}.
Finally, we empirically show that our algorithm outperforms other state-of-the-art BAI algorithms
in both synthetic and real-world datasets.

Constrained Best Arm Identification with Tests for Feasibility

Cross-lingual topic modeling seeks to uncover coherent and semantically aligned topics across languages—a task central to multilingual understanding. Yet most existing models learn topics in disjoint, language-specific spaces and rely on alignment mechanisms (e.g., bilingual dictionaries) that often fail to capture deep cross-lingual semantics, resulting in loosely connected topic spaces. Moreover, these approaches often overlook the rich semantic signals embedded in multilingual pretrained representations, further limiting their ability to capture fine-grained alignment. We introduce **GloCTM** (**Glo**bal Context Space for **C**ross-Lingual **T**opic **M**odel), a novel framework that enforces cross-lingual topic alignment through a unified semantic space spanning the entire model pipeline. GloCTM constructs enriched input representations by expanding bag-of-words with cross-lingual lexical neighborhoods, and infers topic proportions using both local and global encoders, with their latent representations aligned through internal regularization. At the output level, the global topic-word distribution, defined over the combined vocabulary, structurally synchronizes topic meanings across languages. To further ground topics in deep semantic space, GloCTM incorporates a Centered Kernel Alignment (CKA) loss that aligns the latent topic space with multilingual contextual embeddings. Experiments across multiple benchmarks demonstrate that GloCTM significantly improves topic coherence and cross-lingual alignment, outperforming strong baselines.

GloCTM: Cross-Lingual Topic Modeling via a Global Context Space

Recent advances have enabled Large Language Models (LLMs) to tackle reasoning tasks by generating chain-of-thought (CoT) rationales, yet these gains have largely applied to high-resource languages, leaving low-resource languages underperformed. In this work, we first investigate CoT techniques in extremely low-resource scenarios through previous prompting, model editing, and fine-tuning approaches. We introduce \emph{English-Pivoted CoT Training}, leveraging the insight that LLMs internally operate in a latent space aligned toward the dominant language. Given input in a low-resource language, we perform supervised fine-tuning to generate CoT in English and output the final response in the target language. Across mathematical reasoning benchmarks, our approach outperforms other baselines with up to 28.33% improvement in low-resource scenarios. Our analyses and additional experiments, including Mixed-Language CoT and Two-Stage Training, show that explicitly separating language understanding from reasoning enhances crosslingual reasoning abilities. To facilitate future work, we also release LC2024, the first benchmark for mathematical task in Irish, an extremely low-resource and endangered language. Our results and resources highlight a practical pathway to multilingual reasoning without extensive retraining in every extremely low-resource language, despite data scarcity.

Reasoning Transfer for an Extremely Low-Resource and Endangered Language: Bridging Languages Through Sample-Efficient Language Understanding

Pattern Database (PDB) heuristics are an established approach in optimal classical planning and are used in state-of-the-art planning systems. PDBs are based on projections, which induce an abstraction of the original problem. Computing all cheapest plans in the abstraction yields an admissible heuristic. Despite their success, PDBs have only recently been adapted to numeric planning, which extends classical planning with numeric state variables. The difficulty in supporting numeric variables is that the induced abstractions, in contrast to classical planning, are generally infinite. Thus, they cannot be explored exhaustively to compute a heuristic. The foundational work that introduced numeric PDBs employed a simple approach that computes only a finite part of the abstraction. We analyze this framework and identify cases where it necessarily results in an uninformed heuristic. We propose several improvements over the basic variant of numeric PDBs that lead to enhanced heuristic accuracy.

Managing Infinite Abstractions in Numeric Pattern Database Heuristics

We study the ordinal secretary problem, where a sequence of candidates arrives in uniformly random order, and the goal is to select the best candidate using only pairwise comparisons. We consider a learning-augmented setting that incorporates potentially erroneous predictions about the best candidate’s position. Our goal is to design online algorithms that balance robustness against poor predictions while having high performance when predictions are accurate. Using an optimization-based framework, we develop deterministic and randomized algorithms that extend classical strategies and explicitly model the trade-off between consistency and robustness. Also, we show the flexibility of our approach by applying it to multiple secretary problem variants, including multiple-choice and rehiring.

Ordinal Secretaries with Advice

Split Federated Learning is a system-efficient federated learning paradigm that leverages the rich computing resources at a central server to train model partitions. Data heterogeneity across silos, however, presents a major challenge undermining the convergence speed and accuracy of the global model. This paper introduces **Step-wise Momentum Fusion (SMoFi)**, an effective and lightweight framework that counteracts gradient divergence arising from data heterogeneity by synchronizing the momentum buffers across server-side optimizers. To control gradient divergence over the training process, we design a staleness-aware alignment mechanism that imposes constraints on gradient updates of the server-side submodel at each optimization step. Extensive validations on multiple real-world datasets show that SMoFi consistently improves global model accuracy (up to 7.1\%) and convergence speed (up to 10.25$\times$). Furthermore, SMoFi has a greater impact with more clients involved and deeper learning models, making it particularly suitable for model training in resource-constrained contexts.

SMoFi: Step-wise Momentum Fusion for Split Federated Learning on Heterogeneous Data

Medical Large Vision-Language Models (Med-LVLMs) have shown promising results in clinical applications, but often suffer from hallucinated outputs due to misaligned visual understanding. In this work, we identify two fundamental limitations contributing to this issue: insufficient visual representation learning and poor visual attention alignment. To address these problems, we propose MedAlign, a simple, lightweight alignment distillation framework that transfers visual alignment knowledge from a domain-specific Contrastive Language-Image Pre-training (CLIP) model to Med-LVLMs. MedAlign introduces two distillation losses: a spatial-aware visual alignment loss based on visual token-level similarity structures, and an attention-aware distillation loss that guides attention toward diagnostically relevant regions. Extensive experiments on medical report generation and medical visual question answering (VQA) benchmarks show that MedAlign consistently improves both performance and interpretability, yielding more visually grounded outputs.

Enhancing Medical Large Vision-Language Models via Alignment Distillation

Qualitative spatial representation approaches which rely on Goodman-style predicative mereological theories and on a pseudo-topology, often causes some problems either for their use as a meta-information for knowledge conceptualization in advanced geometric reasoning, since they lack Euclidean geometry and fully-fledged topological spaces in the classical sense. Therefore, this paper seeks to extend an existing formalization, grounded in an underlying type theory using the \textit{Coq} language, together with the Whitehead-like point-free Tarski's geometry. More precisely, we leverage an available library called $\lambda$-MM to formalize Tarski’s geometry of solids by investigating an algebraic formulation of topological relations on top of the library. Given that Tarski’s work is grounded in Le{\'s}niewski’s mereology, and despite the fact that $\lambda$-MM barely implements Tarski's geometry, the first part of the paper supplements their work by proving that mereological classes correspond to regular open sets. It forms a topology of individual names extensible with Tarski’s geometric primitives. Unlike classical approaches used in qualitative logical theories, we adopt a solution that enables the specification of a topological space from mereology and a geometric subspace, thereby enhancing the theory’s expressiveness. Then, in a second part, we prove that Tarski’s geometry forms a subspace of the previous topology in which regions are restricted classes. We prove three postulates of Tarski’s work reducing his axiomatic system and extend the theory with the T2 (Hausdorff) property and additional definitions.

Downloads

Next from AAAI 2026

Robust Learning from Noisily Labeled Long-Tailed Data via Fairness Regularizer

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Robust Learning from Noisily Labeled Long-Tailed Data via Fairness Regularizer

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads