Content not yet available

This lecture has no active video or poster.

AAAI 2026

January 22, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Combining Mixture of Experts (MoE) with Low-Rank Adaptation (LoRA) has shown promising efficiency in multi-task instruction tuning for Large Language Models (LLMs). While existing routing schemes for such MoE systems employ auxiliary functions to ensure both expert selection certainty and workload balance among experts, they are hindered by two critical challenges: (1) Existing methods overlook the evolving cross-expert relationships across layers, leading to inefficient expert utilization. (2) The auxiliary functions fail to incorporate cross-task semantic characteristics during expert assignment, leading to suboptimal task adaptation. To address these challenges, we propose $\textbf{H}$ybrid r$\textbf{o}$u$\textbf{t}$ing for a $\textbf{M}$ixture $\textbf{o}$f LoRA $\textbf{E}$xperts ($\textbf{HotMoE}$), a novel multi-task instruction tuning framework that adapts hierarchical routing to the distinct characteristics of different LLM layers. First, we design a $\textit{hybrid routing module}$. In lower layers, expert-expert attention facilitates cross-task collaboration and generalization. In higher layers, token-expert attention enables precise alignment between task semantics and specialized experts. Second, we introduce a $\textit{similarity-guided auxiliary loss module}$ to regularize routing decisions by exploiting hidden state similarities. This loss synergistically reinforces expert specialization without sacrificing certainty of expert selection by promoting cohesive activation patterns among semantically related tasks while sharpening distinctions between conflicting ones. Experiments across two multi-task instruction tuning scenarios covering seven NLP benchmarks demonstrate that HotMoE consistently outperforms all baselines, improving Mean Relative Difference by up to 1.68\% with only 3.1\% of trainable parameters.

Downloads

Paper

Next from AAAI 2026

UniSketch: A Unified Framework for Parametric Sketch Generation and Constraint Prediction
poster

UniSketch: A Unified Framework for Parametric Sketch Generation and Constraint Prediction

AAAI 2026

Rubin Fan and 2 other authors

22 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved