Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Reinforcement learning (RL) has emerged as the predominant paradigm for training large language model (LLM) agents to solve complex, multi-step tasks through environmental interaction. A fundamental challenge in such long-horizon scenarios is credit assignment, as delayed rewards provide inadequate signals for evaluating individual action contributions. Existing methods typically neglect trajectory transition dynamics, which leads to coarse-grained or biased credit assignment. To address these limitations, we introduce SHADOW, a novel framework that systematically incorporates transition dynamics for improved credit assignment. Our framework makes two primary contributions: (i) a dynamics-aware state grouping mechanism that mitigates misleading action comparisons between dynamically inconsistent states, and (ii) a local dynamic advantage estimator that leverages Generalized Advantage Estimation (GAE) to precisely quantify individual action contributions through a fine-grained analysis of transition patterns. Comprehensive experiments conducted with the Qwen2.5-1.5/7B-Instruct agent model demonstrate that our method achieves success rate improvements of 9.4\%/7.6\% on the ALFworld benchmark and a performance gain of over 5\% on WebShop.