EMNLP 2025

November 06, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

We introduce FreqRank, a mutation-based defense to localize malicious components in LLM outputs and their corresponding backdoor triggers. FreqRank assumes that the malicious sub-string(s) consistently appear in outputs for triggered inputs and uses a frequency-based ranking system to identify them. Our ranking system then leverages this knowledge to localize the backdoor triggers present in the inputs. We train six malicious models for three downstream tasks, namely, code completion (CC), code generation (CG), and code summarization (CS), and show that they have an average attack success rate (ASR) of 80.9%. Furthermore, FreqRank’s ranking system highlights the malicious outputs as one of the top five suggestions in over 99.0% of cases. We also demonstrate that FreqRank is capable of localizing the backdoor trigger effectively even with a limited number of triggered samples. Finally, we show that our approach is 40-50% more effective than other defense methods.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Fairness in Automatic Speech Recognition Isn’t a One-Size-Fits-All
poster

Fairness in Automatic Speech Recognition Isn’t a One-Size-Fits-All

EMNLP 2025

+1
Heidi Christensen and 3 other authors

06 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved