Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background

AAAI 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

The emergence of LM-based judging reward modeling, represented by generative reward models, has enabled cost-effective and efficient RLAIF by utilizing synthetic preferences generated from language models as an alternative to human annotations. For the first time, we reveal that such reward modeling shares fundamental formal consistency with Natural Language Inference, the core task in natural language understanding. Accordingly, we innovatively redefine this reward modeling paradigm as a Natural Language Inference task, which implies that we can enhance reward modeling capabilities by scaling the comprehension boundaries of language models. Through performance evaluation, we discover that masked language models incorporating contextual explanations demonstrate superior comprehension boundaries compared to current mainstream autoregressive models. Thus, we propose ESFP-RM, a lightweight two-stage reward model that leverages Explanation based Slot Framework for Prediction. Experimental results demonstrate that ESFP-RM, despite its lighter architecture, significantly outperforms existing LM-based judging reward modeling frameworks, including generative reward models in both RLHF and OOD tasks. By providing more stable and generalizable reward signals for large model alignment, ESFP-RM emerges as a more promising reward modeling paradigm compared to generative reward modeling.

Next from AAAI 2026

DualCPT: Dual-branch Modeling for Cellular Phenotype Transition
technical paper

DualCPT: Dual-branch Modeling for Cellular Phenotype Transition

AAAI 2026

20 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved