EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Large Language Models (LLMs) excel in general language tasks, motivating their adaptation to specialized domains such as healthcare. Effective domain adaptation typically involves supervised fine-tuning (SFT) on carefully selected instruction-tuning data. Current data selection methods adopt a data-centric approach, relying on external annotations and heuristics to identify external defined high-quality and challenging data. Our exploratory experiments highlight this approach fails to improve model's domain performance, due to misalignment between selected data and the model’s knowledge distribution. To tackle this, we propose Decomposed Difficulty-based Data Selection (3DS), a two-stage model-centric data selection framework that aligns data selection with the model’s distribution. 3DS employs a Prompt-Driven Data Selection to filter out noisy data based on the model's knowledge via explicit alignment in Stage#1, then adopts a Decomposed Difficulty-based Data Selection to guide selection via three novel data difficulty metrics, including Instruction Understanding, Response Confidence, and Response Correctness in Stage#2. These metrics are enhanced by an attention-based importance weighting mechanism for accurate calibration. Extensive experiments in the healthcare domain show 3DS outperforms existing methods by over 2.97% accuracy, with additional validation in the law domain confirming its generalization ability. Our dataset and code are open-sourced at https://anonymous.4open.science/r/3DS-E67F.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Humans Hallucinate Too: Language Models Identify and Correct Subjective Annotation Errors With Label-in-a-Haystack Prompts
poster

Humans Hallucinate Too: Language Models Identify and Correct Subjective Annotation Errors With Label-in-a-Haystack Prompts

EMNLP 2025

+4Georgios Chochlakis
Tikka Arjun Singh Bedi and 6 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved