Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Prompt-injection and jailbreak attacks can coerce large language models (LLMs) into revealing system prompts or producing unsafe content, threatening real-world deployments. We present Proxy Barrier (ProB), a lightweight defense that interposes a repeater proxy LLM between the user and the target model. The repeater acts to verbatim-echo benign user input, and any divergence indicates adversarial tampering, causing the request to be dropped before it reaches the target model, so that attempts to bypass safety boundaries are blocked. ProB therefore requires no access to model weights or prompts, is model-agnostic, and deployable entirely at the API level. Experiments across multiple model families demonstrate that ProB achieves state-of-the-art resilience against prompt leakage and jailbreak attacks. Notably, our approach achieves up to 98.8% improvement in defense effectiveness over baselines, and shows robust protection across both open and closed-source LLMs when suitably paired with proxy models.