Content not yet available

This lecture has no active video or poster.

AAAI 2026 Main Conference

January 24, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Recent advances demonstrate that reinforcement learning with verifiable rewards (RLVR) significantly enhances the reasoning capabilities of large language models (LLMs). However, standard RLVR faces challenges with reward sparsity, where zero rewards from consistently incorrect candidate answers provide no learning signal, particularly in challenging tasks. To address this, we propose, we propose \textbf{M}ulti-\textbf{E}xpert \textbf{M}utual \textbf{L}earning GRPO (MEML-GRPO), an innovative framework that utilizes diverse expert prompts as system prompts to generate a broader range of responses, substantially increasing the likelihood of identifying correct solutions. Additionally, we introduce an inter-expert mutual learning mechanism that facilitates knowledge sharing and transfer among experts, further boosting the model’s performance through RLVR. Extensive experiments across multiple reasoning benchmarks show that MEML-GRPO delivers significant improvements, achieving an average performance gain of 4.89\% with Qwen and 11.33\% with Llama, effectively overcoming the core limitations of traditional RLVR methods.

Downloads

Paper

Next from AAAI 2026 Main Conference

Dual-View Inference Attack: Machine Unlearning Amplifies Privacy Exposure
poster

Dual-View Inference Attack: Machine Unlearning Amplifies Privacy Exposure

AAAI 2026 Main Conference

+6
Peijin Guo and 8 other authors

24 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved