AAAI 2026

January 25, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Fine-tuning large language models (LLMs) improves performance but introduces critical safety vulnerabilities: even minimal harmful data can severely compromise safety measures. We observe that perturbations orthogonal to the alignment direction—defined by weight differences between aligned (safe) and unaligned models—rapidly compromise model safety. In contrast, updates along the alignment direction largely preserve it, revealing the parameter space as a ''narrow safety basin''. To address this, we propose SECURE (Safety Enforcement Constraint Using Regularized Orthogonality) to maintain safety by explicitly constraining update directions during fine-tuning. By penalizing updates orthogonal to the alignment direction, SECURE effectively constrains the model within the ''narrow safety basin," thus preserving its inherent safety. Extensive experiments on multiple datasets and models show that SECURE reduces harmful behaviors by up to 7.60\%, improves task performance by 3.44\%, and consistently outperforms existing methods across multiple tasks. Code and datasets are available at: https://anonymous.4open.science/r/69F7-ED36/.

Downloads

SlidesPaperTranscript English (automatic)

Next from AAAI 2026

Multi-Aspect Cross-modal Quantization for Generative Recommendation
technical paper

Multi-Aspect Cross-modal Quantization for Generative Recommendation

AAAI 2026

+5
huan chen and 7 other authors

25 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved