EMNLP 2025

November 08, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Jailbreaking, the phenomenon where specific prompts cause LLMs to assist with harmful requests, remains a critical challenge in NLP, particularly in non-English and lower-resourced languages. To address this, we introduce MULBERE, a method that extends the method of Targeted Latent Adversarial Training (T-LAT) to a multilingual context. We first create and share a multilingual jailbreak dataset spanning high-, medium-, and low-resource languages, and then fine-tune LLaMA-2-7b-chat with interleaved T-LAT for jailbreak robustness and chat examples for model performance. Our evaluations show that MULBERE reduces average multilingual jailbreak success rates by 75\% compared to the base LLaMA safety training and 71\% compared to English-only T-LAT while maintaining or improving standard LLM performance.

Downloads

Paper

Next from EMNLP 2025

JustDial: Language Model Dialect Debiasing Using Biased Character Trait Associations
workshop paper

JustDial: Language Model Dialect Debiasing Using Biased Character Trait Associations

EMNLP 2025

Walter Gerych
Walter Gerych and 2 other authors

08 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved