EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

The reward model (RM) plays a crucial role in aligning Large Language Models (LLMs) with human preferences through Reinforcement Learning, where the Bradley-Terry (BT) objective has been recognized as simple yet powerful, specifically for pairwise preference comparison. However, BT-based RMs often struggle to effectively distinguish between similar preference responses, leading to insufficient separation between preferred and non-preferred outputs. Consequently, they may easily overfit easy samples and cannot generalize well to Out-Of-Distribution (OOD) samples, resulting in suboptimal performance. To address these challenges, this paper introduces an effective enhancement to BT-based RMs through an adaptive margin mechanism. Specifically, we design to dynamically adjust the RM focus on more challenging samples through margins, based on both semantic similarity and model-predicted reward differences that is approached from a distributional perspective solvable with Optimal Transport (OT). By incorporating these factors into a principled OT cost matrix design, our adaptive margin enables the RM to better capture distributional differences between chosen and rejected responses, yielding significant improvements in performance, convergence speed, and generalization capabilities. Experimental results across multiple benchmarks demonstrate that our method significantly outperforms several existing RM techniques, showcasing enhanced performance in both In-Distribution (ID) and OOD settings.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

UnitCoder: Scalable Code Synthesis from Pre-training Corpora
poster

UnitCoder: Scalable Code Synthesis from Pre-training Corpora

EMNLP 2025

+5Qipeng Guo
Kai Chen and 7 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved