EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Sycophancy causes models to produce answers that cater to user expectations rather than providing truthful responses. Sycophantic behavior in models can erode user trust by creating a perception of dishonesty or bias. This lack of authenticity may lead users to question the reliability and objectivity of the system's responses. Although Reinforcement Learning from Human Feedback (RLHF) is effective in aligning models with human preferences, previous studies have observed that it can simultaneously amplify sycophantic behavior. However, these studies primarily focused on proprietary models and employed indirect analysis to demonstrate the influence of human feedback. Our study focuses on sycophancy in open-source models, which are more reproducible and transparent for research. We investigated the impact of human feedback on sycophancy by directly comparing models aligned with human feedback to those not aligned. To address sycophancy, we proposed assessing the user's expected answer rather than ignoring it. Consequently, we developed the Sycophancy Answer Assessment (SAA) dataset and introduced Self-Augmented Preference Alignment, demonstrating that these methods effectively enhance the model's assessment ability and significantly reduce sycophancy across tasks.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
poster

Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders

EMNLP 2025

+3
Yuen Chen and 5 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved