EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

We introduce \textbf{ESGenius}, a comprehensive benchmark for evaluating and enhancing the proficiency of Large Language Models (LLMs) in Environmental, Social and Governance (ESG) and sustainability-focused question answering. \textbf{ESGenius} comprises three key components: (i) \textbf{ESGenius-QA}, a collection of \textbf{1,136} MCQs generated by LLMs and rigorously validated by domain experts, covering a broad range of ESG pillars and sustainability topics. Each question is systematically linked to its corresponding source text, enabling transparent evaluation and supporting Retrieval-Augmented Generation (RAG) methods; and (ii) \textbf{ESGenius-Corpus}, a meticulously curated repository of \textbf{225} foundational frameworks, standards, reports, and recommendation documents from \textbf{7} authoritative sources. Moreover, to fully assess the capabilities and adaptation potential of the model, we implement a rigorous two-stage evaluation protocol---\emph{Zero-Shot} and \emph{RAG}. Extensive experiments across \textbf{50} LLMs (ranging from 0.5B to 671B parameters) demonstrate that state-of-the-art models achieve only moderate performance in zero-shot settings, with accuracies mostly around 55-70%, highlighting ESGenius's challenging nature. However, models employing RAG demonstrate significant performance improvements, particularly for smaller models. For example, DeepSeek-R1-Distill-Qwen 14B improves from 63.82% in the zero-shot setting to 80.46% with RAG. These results demonstrate the necessity of grounding responses in authoritative sources for enhanced ESG understanding. To our best of knowledge, ESGenius is the first benchmark curated for LLMs and the relevant enhancement technologies, focusing on ESG and sustainability topics.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

What You Read Isn't What You Hear: Linguistic Sensitivity in Deepfake Speech Detection
technical paper

What You Read Isn't What You Hear: Linguistic Sensitivity in Deepfake Speech Detection

EMNLP 2025

+1
Thai Le and 3 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved