AAAI 2026

January 25, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Reinforcement learning (RL) has emerged as the predominant paradigm for training large language model (LLM) agents to solve complex, multi-step tasks through environmental interaction. A fundamental challenge in such long-horizon scenarios is credit assignment, as delayed rewards provide inadequate signals for evaluating individual action contributions. Existing methods typically neglect trajectory transition dynamics, which leads to coarse-grained or biased credit assignment. To address these limitations, we introduce SHADOW, a novel framework that systematically incorporates transition dynamics for improved credit assignment. Our framework makes two primary contributions: (i) a dynamics-aware state grouping mechanism that mitigates misleading action comparisons between dynamically inconsistent states, and (ii) a local dynamic advantage estimator that leverages Generalized Advantage Estimation (GAE) to precisely quantify individual action contributions through a fine-grained analysis of transition patterns. Comprehensive experiments conducted with the Qwen2.5-1.5/7B-Instruct agent model demonstrate that our method achieves success rate improvements of 9.4\%/7.6\% on the ALFworld benchmark and a performance gain of over 5\% on WebShop.

Downloads

SlidesPaper

Next from AAAI 2026

A Reasoning Paradigm for Named Entity Recognition
poster

A Reasoning Paradigm for Named Entity Recognition

AAAI 2026

+2
Yanping Chen and 4 other authors

25 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved