technical paper
Work-in-progress: Corrected Self Imitation Learning via Demonstrations
DOI: 10.48448/fgvf-wt42
While reinforcement learning (RL) agents have the remarkable ability to learn by interacting with their environments, this process is often slow and data inefficient. Because environment interaction is typically expensive, many approaches have been studied to speed up RL. One popular method for doing so is to leverage human knowledge via imitation learning (IL), in which a demonstrator provides an example of the desired behavior, and the agent seeks to imitate. In this in-progress work, we propose a new way of integrating IL and deep RL, which we call corrected self imitation learning, where an agent provided with demonstration can learn faster compared to an agent with no demonstration. Our method does not increase the number of environmental interactions compared to a baseline RL method, and works well even in the case when the demonstrator is not an expert. We evaluate our method in the Atari game of Ms. Pac-Man and achieve promising results indicating our method has the potential to speed up deep RL algorithms.