EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Chain-of-Thought (CoT) distillation has emerged as a promising paradigm to enhance the reasoning ability of small language models by imitating the reasoning and outputs of larger teacher models. However, existing approaches suffer from a critical limitation: a distribution mismatch between teacher-generated training trajectories and the student model's own generative distribution. This mismatch leads to exposure bias during inference and often induces mode collapse or mode averaging, thereby degrading the student model's generative diversity and robustness. To address these issues, we propose CoTD-PO (Chain-of-Thought Distillation with Preference Optimization), a reinforcement learning framework that shifts the training paradigm from passive imitation to active trajectory exploration. Instead of forcing the student to imitate exact teacher traces, our method enables the student to sample its own reasoning paths. To support training with non-open-source teacher models, we approximate the teacher’s output distribution through preference-based scoring. Furthermore, we adopt an offline iterative training procedure that enables stable and efficient optimization. Experiments on diverse open-ended generation tasks demonstrate that CoTD-PO significantly outperforms standard CoT distillation baselines, achieving higher output quality while mitigating mode collapse and preserving semantic diversity.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests
poster

Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests

EMNLP 2025

+6Raquel Fernández
Raffaella Bernardi and 8 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved