Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Large language models (LLMs) have rapidly advanced, but their growing compute and memory demands make them unsustainable and limit accessibility, especially in under-resourced regions such as Southeast Asia (SEA). While recent hybrid architectures combining attention and state-space models (SSMs) have shown promise, most adopt sequentially interleaving attention and Mamba layers, leaving parallel-head mixing of attention and Mamba heads largely unexplored. Hence, I propose investigating Hymba-style parallel-head hybrid architecture as a foundation for efficient, multilingual LLMs in SEA. My short-term goal is to perform continual pre-training (CPT) of the released Hymba-1.5B weights on SEA corpora to evaluate their adaptability across diverse languages. In the longer term, I plan to study scaling strategies beyond 1.5B parameters, assessing whether parallel-head hybrids maintain efficiency and performance at larger scales. Evaluation will combine standard perplexity and benchmark tasks with SEA-specific benchmarks, alongside profiling for inference throughput and deployability on resource-constrained devices. The expected outcomes are twofold: (1) demonstrating that parallel-head hybrids can be effectively adapted to SEA multilingual contexts, and (2) providing evidence that this underexplored architecture scales efficiently. Success would broaden the design space of efficient LLMs while advancing equitable access to AI by enabling practical, low-cost, and locally relevant models for SEA communities.