
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
keywords:
ml
reinforcement learning
The centralized planning for simultaneous-move decentralized execution paradigm emerged as the state-of-the-art approach to $\epsilon$-optimally solving a decentralized partially observable Markov decision process. However, scalability remains a significant issue.
This paper presents a novel---more scalable---alternative, namely sequential central planning for simultaneous-move decentralized execution.
This methodology further pushes the applicability of bellman's principle of optimality, raising three new properties.
First, it allows a central planner to reason upon sufficient sequential-move statistics instead of prior simultaneous-move ones.
Next, it proves that $\epsilon$-optimal value functions are piecewise linear and convex in such sufficient sequential-move statistics.
Finally, it drops the complexity of the backup operators from double exponential to polynomial at the expense of longer planning horizons.
Besides, it makes it easy to use single-agent methods, e.g. SARSA algorithm enhanced with these findings applies while still preserving convergence guarantees.
Experiments on standard $2$- as well as $n$-agent benchmarks from the literature against state-of-the-art $\epsilon$-optimal simultaneous-move solvers, confirm the superiority of the novel approach.
This paradigm opens the door for efficient planning and reinforcement learning methods for multi-agent systems.
