EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Automatic model assessment has long been a critical challenge. Traditional methods, usually matching-based or small model-based, often fall short in open-ended and dynamic scenarios. Recent advancements in Large Language Models (LLMs) inspire the LLM-as-a-judge'' paradigm, where LLMs are leveraged to perform scoring, ranking, or selection for various machine learning evaluation scenarios. This paper presents a comprehensive survey of LLM-based judgment and assessment, offering an in-depth overview to review this evolving field. We first provide the definition from both input and output perspectives. Then we introduce a systematic taxonomy to explore LLM-as-a-judge along three dimensions: \textit{what} to judge, \textit{how} to judge, and \textit{how} to benchmark. Finally, we also highlight key challenges and promising future directions for this emerging area. We have released and will maintain a paper list about \textbf{LLM-as-a-judge} at: \url{https://anonymous.4open.science/r/Awesome-LLM-as-a-judge-266D}.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

NUTMEG: Separating Signal From Noise in Annotator Disagreement
poster

NUTMEG: Separating Signal From Noise in Annotator Disagreement

EMNLP 2025

David JurgensJonathan Ivey
Susan Gauch and 2 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved