EMNLP 2025

November 06, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Large language models (LLMs) are widely deployed as zero-shot evaluators for answer grading, content moderation, and document ranking. Yet studies show that guard models (Guards)—LLMs fine-tuned for safety—remain vulnerable to "jailbreak" attacks, jeopardising downstream chatbots. We confirm this weakness on three public benchmarks (BeaverTails, XSTest, AdvBench) and trace it to representation shifts that arise in the embedding layer and cascade through the Transformer stack. To counteract the effect, we introduce Gamma-Guard: lightweight residual adapters inserted after the embeddings and at sparse intervals in the model. The adapters start with zero-scaled gates, so they retain the original behaviour; a brief adversarial fine-tuning phase then teaches them to denoise embeddings and refocus attention. With fewer than 0.1% extra parameters and only a 2% latency increase, Gamma-Guard lifts adversarial accuracy from <5% to 95% a 90 percentage-point gain while reducing clean-data accuracy by just 8 percentage points. Extensive ablations further show that robustness improvements persist across different layer placements and model sizes. To our knowledge, this is the first approach that directly augments large Guards with trainable adapters, providing a practical path toward safer large-scale LLM deployments.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement
poster

From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement

EMNLP 2025

+4Youcheng Pan
Shiwei Chen and 6 other authors

06 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved