Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Test-Time Scaling (TTS) is a promising new approach to progressively elicit the model’s intelligence during inference. Recently, training-based test-time scaling methods, such as continued reinforcement learning (RL), have further surged in popularity, while training-free methods have faded from prominence. However, the additional train-time computation amplifies the burden on test-time scaling. In this paper, we design a finer-grained sequential scaling method called Conditional Step-level Self-refinement with the support of process verification. On top of its effectiveness, we further combine it with other classical parallel scaling methods at the step level, to introduce a novel paradigm called Hybrid Test-Time Scaling. Extensive experiments on five instruction-tuned LLMs across different scales (3B-14B) and families demonstrate that this hybrid strategy incorporating multiple training-free test-time scaling methods at finer granularity has considerable potential for expanding the reasoning performance boundaries of LLMs.