Content not yet available

This lecture has no active video or poster.

AAAI 2026

January 22, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Large language models (LLMs) have shown impressive capabilities in natural language tasks, yet they continue to struggle with multi-step mathematical reasoning, where correctness depends on a precise chain of intermediate steps. Preference optimization methods such as Direct Preference Optimization (DPO) have improved answer-level alignment, but they often overlook the reasoning process itself, providing little supervision over intermediate steps that are critical for complex problem-solving. Existing fine-grained approaches typically rely on strong annotators or reward models to assess the quality of individual steps. However, reward models are vulnerable to reward hacking. To address this, we propose \textbf{ISLA}, a reward-model-free framework that constructs step-level preference data directly from SFT gold traces. ISLA also introduces a self-improving pruning mechanism that identifies informative steps based on two signals: their marginal contribution to final accuracy (\textit{relative accuracy}) and the model’s \textit{uncertainty}, inspired by the concept of information gain. Empirically, ISLA achieves better performance than DPO while using only 12\% of the training tokens, demonstrating that careful step-level selection can significantly improve both reasoning accuracy and training efficiency.

Downloads

Paper

Next from AAAI 2026

DeloopSGNN: Revisiting Spectral GNNs Through the Lens of Spatial Aggregation
poster

DeloopSGNN: Revisiting Spectral GNNs Through the Lens of Spatial Aggregation

AAAI 2026

+5
Yong Dong and 7 other authors

22 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved