Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background

EMNLP 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

In recent months, substantial progress has been made in complex reasoning of Large Language Models (LLMs), particularly through the application of test-time scaling. Notable examples include, though are not limited to, OpenAI's o1/o3/o4 series and DeepSeek-R1. When responding to a query, these models generate an extended reasoning trajectory, during which the model explores, reflects, backtracks, and self-verifies before arriving at a conclusion. However, fine-tuning models with such reasoning trajectories may not always be optimal. Our findings indicate that not all components within these reasoning trajectories contribute positively to the reasoning process; in fact, some components may affect the overall performance negatively. In this study, we divide a reasoning trajectory into individual subtrajectories and develop a "5+2" framework to: (1) systematically identify suboptimal subtrajectories within the reasoning trajectory based on five human-established criteria; (2) assess the independence of the suboptimal subtrajectories identified in (1) from the subsequent content, ensuring that their elimination does not compromise overall flow and coherence of the reasoning process. Additionally, a sampling algorithm, built upon the "5+2" framework, is employed to select data whose reasoning process is free from suboptimal subtrajectories to the highest degree. Experimental results demonstrate that our method can reduce the number of suboptimal subtrajectories by 25.9\% during the inference. Furthermore, our method achieves an average accuracy of 58.92\% on highly challenging AIME24, AIME25, AMC24 and MATH500 benchmarks with only two thirds of training data, surpassing the average accuracy of 58.06\% achieved with the entire data, and outperforming open-source datasets, including s1K-1.1, Light-R1-SFT-stage-1, OpenR1-Math-94k, and OpenThoughts-114k, when fine-tuning Qwen2.5-Math-7B. Finally, we have validated the efficacy of our method under resource-constrained scenarios, where it exhibits performance improvements across different maximum inference token limits: 2k, 4k, 8k, and 16k tokens.

Downloads

Paper
access premium content

Next from EMNLP 2025

Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers
poster

Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers

EMNLP 2025

+4
Yingyu Liang and 6 other authors

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved