Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Small language models (SLMs) run quickly, consume little memory, and can be deployed on edge devices, making them especially appealing when compute or energy is limited. Because of these advantages, boosting SLMs' reasoning ability has become an important research goal. A common approach is to distill the long chains of thought (long-CoTs) produced by large reasoning models (LRMs) into SLMs, hoping to transfer the larger models’ strong reasoning ability. However, SLMs do not always benefit from distillation of long-CoTs. The lengthy and complex semantic steps and large amount of self-reflection contents in long-CoTs may exceed the limited learning capabilities of SLMs, and the impact of self-reflection density on the performance of SLMs is unclear. To resolve this capacity mismatch, we propose \textbf{MACoT}, a multi-agent framework that \textit{synthesizes} chains of thought (CoTs) that are more suitable for small models rather than compressing or pruning existing ones. Through the interactive collaboration among six types of agents, \textbf{MACoT} synthesizes semantically explicit, logically clear CoTs that efficiently activate a small model’s internal knowledge through a carefully designed output pattern. At the same time, the CoTs synthesized by our method can retain a small amount of self-reflection content, thereby matching the learning capability of the small model and maximizing its reasoning accuracy. We fine-tuned Qwen2.5-7B-Instruct using only 1879 synthetic CoTs, significantly improving its performance on mathematical reasoning tasks and generalizing well, outperforming models trained on 5x more data. Through experiments, we found that a modest level of self-reflection boosts small-model performance, whereas excessive reflection sharply degrades it, which shows that “teaching SLMs to think” hinges on aligning each CoT’s cognitive load with the model’s capacity.