Content not yet available
This lecture has no active video or poster.
Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Recently, End-to-End Speech Translation (E2E-ST) methods leveraging large language models (LLMs) have demonstrated strong generalization capabilities and excellent scalability by integrating pre-trained speech encoders with LLMs, where Low-Rank Adaptation (LoRA) is commonly used for parameter-efficient fine-tuning to reduce training costs. However, LoRA's low-rank assumption often fails in multilingual tasks, as the inherent complexity of cross-lingual semantic relationships and syntactic variations exceeds the representational capacity of low-rank matrices. This leads to parameter conflicts across languages, resulting in suboptimal performance. To address this issue, we propose Mixture of Low-Rank Adaptations (MoLoRA), which integrates the Mixture of Experts (MoE) mechanism with LoRA. MoLoRA effectively enhances the model's expressive capacity while maintaining parameter efficiency during training. Specifically, we treat multiple LoRA modules as low-rank experts and introduce a routing mechanism to dynamically activate language-specific experts. Additionally, shared experts are incorporated and consistently activated to model cross-lingual general knowledge. Furthermore, to enhance the robustness and accuracy of speech representations, we propose a Multi-Granularity Representation Fusion module (MGRF). This module mitigates local distortions in frame-level speech representations caused by noise by fusing frame-level and sentence-level features, thereby providing the LLM with more accurate high-level semantic information. We conduct multilingual experiments on the MuST-C and CoVoST-2 datasets. Our method achieves an average BLEU score of 32.2 across eight language pairs on the MuST-C dataset and an average of 36.3 across three language pairs on the CoVoST-2 dataset, establishing a new state-of-the-art (SOTA) performance.