Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
While \textbf{RE}trieval-\textbf{A}ugmented \textbf{L}LM-based \textbf{M}achine \textbf{T}ranslation (REAL-MT) shows promise, its behavior under noisy contexts remains poorly understood. In this work, we propose a noise synthesis framework and robustness metrics to assess REAL-MT under noisy contexts. We evaluate REAL-MT systems based on Qwen series models on idiomatic translation tasks across diverse languages and resource levels under noisy contexts. Our results reveal that LLMs exhibit severe degradation in translation quality, frequently generating nonsensical translations. Although large reasoning models (LRMs) possess enhanced reasoning capabilities, they show no improvement in error correction and are even more susceptible to noise. By analyzing attention patterns, we find that the model shifts its focus from essential idiomatic components to noisy contextual content, leading to erroneous translations. We investigate training-free and training-based strategies that enhance robustness but slightly degrade performance in clean contexts. These results highlight the limitations of current approaches and underscore the need for more effective methods that strike a balance between noise resistance and knowledge integration.