Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Recently, large reasoning models have achieved impressive performance on various tasks by employing human-like deep thinking. However, the lengthy thinking process substantially increases inference overhead, making efficiency a critical bottleneck. In this work, we first demonstrate that \textit{NoThinking}, which prompts the reasoning model to skip thinking and directly generate the final solution, is a better choice for relatively simple tasks in terms of both performance and efficiency. Motivated by this, we propose \textit{AdaptThink}, a novel RL algorithm to teach reasoning models to choose the optimal thinking mode adaptively based on problem difficulty. Specifically, \textit{AdaptThink} features two core components: (1) a constrained optimization objective that encourages the model to choose \textit{NoThinking} while maintaining the overall performance; (2) an importance sampling strategy that balances \textit{Thinking} and \textit{NoThinking} samples during on-policy training, thereby enabling cold start and allowing the model to explore and exploit both thinking modes throughout the training process. Our experiments indicate that \textit{AdaptThink} significantly reduces the inference costs while further enhancing performance. Notably, on three math datasets, \textit{AdaptThink} reduces the average response length of DeepSeek-R1-Distill-Qwen-1.5B by 53% and improves its accuracy by 2.4%, highlighting the promise of adaptive thinking-mode selection for optimizing the balance between reasoning quality and efficiency.