Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background

EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Recently, Agentic AI has become an increasingly popular field of research. However, we argue that current practices on agent research are far from standard, rigorous scientific research, which makes it hard to conduct apples-to-apples comparisons among and against existing methods. As a result, it is still obscure how different design choices in an agent framework impact its effectiveness, and measuring progress on agent research remains very hard. In this work, we conduct a systematic empirical study on the GAIA benchmark to investigate the impact of different popular design choices within key agent components in a fair and rigorous way. To begin with, we find that the lack of a standard evaluation protocol makes previous works, even the open-sourced ones, not reproducible, and the variance between different random runs is often non-negligible. Therefore, we first introduce a more robust evaluation protocol to make comparisons more stable. Our empirical study then unveils which components and designs, as well as correlations between these designs, are the keys for building effective agents, while others are not and redundant, despite seemingly making sense. With the insights gained from our empirical study, we build and open-source OAgents, a new foundation agent framework that achieves state-of-the-art performance among open-source projects, providing a good starting point and guidelines for building effective agents. More importantly, \owo{} supports various design choices for agent components in a modularized way, facilitating future scientific research on Agentic AI.

Downloads

Paper
access premium content

Next from EMNLP 2025

2Columns1Row: A Russian Benchmark for Textual and Multimodal Table Understanding and Reasoning
poster

2Columns1Row: A Russian Benchmark for Textual and Multimodal Table Understanding and Reasoning

EMNLP 2025

+1Alena Fenogenova
Alena Fenogenova and 3 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved