EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Instruction tuning(IT) is an effective approach for aligning large language models (LLMs) with human intentions. There is ongoing discourse regarding the data quality for IT. As an effort to find the robust criteria of data quality for IT, we introduce LimaCost, a data quality measure that exhibits a strong correlation with model performance. LimaCost utilizes LIMA dataset, which effectiveness in IT has already been validated by several previous works. LimaCost then estimates the value of a given data by estimating how many LIMA data points might be needed to approximate its gradient. Our experiments reveal that LimaCost enables effective data selection that derive high alignment performance. We demonstrate that selecting data based on high LimaCost proves to be more effective than existing data selection strategies.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward
poster

Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward

EMNLP 2025

+10
Xing Chen and 12 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved