EMNLP 2025

November 06, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Prompt-injection and jailbreak attacks can coerce large language models (LLMs) into revealing system prompts or producing unsafe content, threatening real-world deployments. We present Proxy Barrier (ProB), a lightweight defense that interposes a repeater proxy LLM between the user and the target model. The repeater acts to verbatim-echo benign user input, and any divergence indicates adversarial tampering, causing the request to be dropped before it reaches the target model, so that attempts to bypass safety boundaries are blocked. ProB therefore requires no access to model weights or prompts, is model-agnostic, and deployable entirely at the API level. Experiments across multiple model families demonstrate that ProB achieves state-of-the-art resilience against prompt leakage and jailbreak attacks. Notably, our approach achieves up to 98.8% improvement in defense effectiveness over baselines, and shows robust protection across both open and closed-source LLMs when suitably paired with proxy models.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

AraSafe: Benchmarking Safety in Arabic LLMs
poster

AraSafe: Benchmarking Safety in Arabic LLMs

EMNLP 2025

Hamdy Mubarak
Majd Hawasly and 2 other authors

06 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved