AAAI 2026 Main Conference

January 24, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Small language models (SLMs) run quickly, consume little memory, and can be deployed on edge devices, making them especially appealing when compute or energy is limited. Because of these advantages, boosting SLMs' reasoning ability has become an important research goal. A common approach is to distill the long chains of thought (long-CoTs) produced by large reasoning models (LRMs) into SLMs, hoping to transfer the larger models’ strong reasoning ability. However, SLMs do not always benefit from distillation of long-CoTs. The lengthy and complex semantic steps and large amount of self-reflection contents in long-CoTs may exceed the limited learning capabilities of SLMs, and the impact of self-reflection density on the performance of SLMs is unclear. To resolve this capacity mismatch, we propose \textbf{MACoT}, a multi-agent framework that \textit{synthesizes} chains of thought (CoTs) that are more suitable for small models rather than compressing or pruning existing ones. Through the interactive collaboration among six types of agents, \textbf{MACoT} synthesizes semantically explicit, logically clear CoTs that efficiently activate a small model’s internal knowledge through a carefully designed output pattern. At the same time, the CoTs synthesized by our method can retain a small amount of self-reflection content, thereby matching the learning capability of the small model and maximizing its reasoning accuracy. We fine-tuned Qwen2.5-7B-Instruct using only 1879 synthetic CoTs, significantly improving its performance on mathematical reasoning tasks and generalizing well, outperforming models trained on 5x more data. Through experiments, we found that a modest level of self-reflection boosts small-model performance, whereas excessive reflection sharply degrades it, which shows that “teaching SLMs to think” hinges on aligning each CoT’s cognitive load with the model’s capacity.

Downloads

Paper

Next from AAAI 2026 Main Conference

SRD: Reinforcement-Learned Semantic Perturbation for Backdoor Defense in VLMs
poster

SRD: Reinforcement-Learned Semantic Perturbation for Backdoor Defense in VLMs

AAAI 2026 Main Conference

+6
Siyuan Liang and 8 other authors

24 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved