Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Real-world sequential decision making problems often require parameterized action spaces that require both, decisions regarding discrete actions and decisions about continuous action parameters governing how an action is executed. However, existing approaches exhibit severe limitations when handling such parameterized action spaces---planning algorithms require hand-crafted action models, and reinforcement learning (RL) paradigms focus on either discrete or continuous actions but not both. This paper extends the scope of RL algorithms to long-horizon, spare-reward settings with parameterized actions through autonomously learned state and action abstractions. We present algorithms for online learning and flexible refinement of such abstractions during RL. Empirical results show that learning such abstractions on-the-fly enable $TD(\lambda)$ to significantly outperform state-of-the-art RL approaches in terms of sample efficiency across diverse problem domains with long horizons, continuous states, and parameterized actions.
