Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Non-Markovian Tasks (NMTs) are distinguished by their dependence on long-term memory and state-dependent dynamics, setting them apart from the traditional Markovian models typically employed in Reinforcement Learning (RL). NMTs not only suffer from reward sparseness but also rely on historical information, making their resolution considerably more challenging. In this paper, we propose a novel RL framework T4NMTD (Transition-centric framework for NMT Decomposition), designed specifically for learning NMTs which are specified by temporal logic. The core of T4NMTD is a task decomposition mechanism along with a parallel training approach for NMTs. An NMT is first decomposed as basic units based on the transitions of the automata which are derived from temporal logic formulae. The units are then modularized into sub-tasks according to their semantic similarity under logical interpretation. The training strategy of T4NMTD adopts a dual-level structure: the high-level learns to shape the boundaries and coordinate arrangement of the sub-tasks from a global perspective, while the low-level learns those sub-tasks in parallel. In addition, we invent a dynamic policy intervention scheme to mitigate the policy myopic issue during parallel training. A comprehensive evaluation is conducted on benchmark problems with respect to various metrics. The experimental results demonstrate that T4NMTD effectively addresses NMTs, achieving significant performance improvements compared with related studies.
