Content not yet available
This lecture has no active video or poster.
Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Large Language Model (LLM) agents have demonstrated strong potential in complex, interactive decision-making tasks. However, when training LLM agents end-to-end with reinforcement learning (RL), efficiently optimizing agent policies in dynamic environments remains a significant challenge. Existing RL-based LLM agent paradigms commonly organize interactions in a cycle where reasoning is followed by action. In our work, we observe a phenomenon we call Exploration Contraction, where the explicit introduction of a reasoning stage reduces the diversity of actions—quantified by lower action entropy—which in turn limits exploration and leads to premature policy convergence. To address this limitation, we propose Act-before-Reasoning (ActRe), a two-stage RL training framework. In the first stage, we reverse the typical rollout order, prompting the agent to generate actions prior to reasoning, which encourages exploration driven by model intuition. In the second stage, we restore the standard reasoning-then-action order for training and evaluation, ensuring robust and interpretable decision-making. Experiments on the ALFWorld and WebShop benchmarks show that ActRe effectively mitigates exploration contraction, yielding consistently higher task success rates and improved training robustness compared to strong RL baselines. Our analysis underscores the importance of action entropy in the exploration-exploitation trade-off during LLM agent training and provides a practical approach to maintain the benefits of explicit reasoning while promoting sufficient exploration.
