Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Recent advances in large language model (LLM) fine‑tuning have shown that incorporating high‑quality reasoning traces into training data can markedly improve downstream performance. However, existing approaches often depend on expensive manual annotations or auxiliary models, and fail to adapt to the unique limitations of smaller “weak” LLMs. To address these gaps, we introduce Weak2Wise, a fully automated, lightweight framework for synthesizing high‑quality, weak-LLM-friendly reasoning traces. Starting from a QA dataset, Weak2Wise filters out the samples that can already be correctly answered by the weak LLM, gathers diverse candidate reasoning traces from multiple strong LLMs, and leverages our Step‑Mask scoring to rank and truncate the most guidance‑effective traces. These reasoning traces are then used for fine‑tuning, yielding substantial improvements in the weak LLM’s reasoning abilities. The name Weak2Wise has two meanings: using a “weak” LLM to select the "wisest" reasoning traces generated by stronger LLMs, and fine‑tuning the same weak LLM on these reasoning traces to become “wiser”. We further use Weak2Wise to build GR-1K, a 1,000‑sample math and science QA‑reasoning dataset optimized for weak LLMs, and fine‑tune Qwen2.5‑7B on it to create GR‑7B, which achieves superior performance on AIME2024, MATH‑500, and GPQA Diamond benchmarks.