Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
We study the problem of solving one-sided, zero-sum, partially observable stochastic games (POSGs). These games model sequential interactions between two adversaries, where one player has partial observability of the game state. They are applicable to many important domains, such as robotics and cybersecurity. Solving such games is computationally challenging since the solution depends on the first player's belief about the game state, which belongs to a continuous (and often high-dimensional) belief space. In the literature, only a single method has demonstrated reliable performance for solving these types of games, namely Heuristic Search Value Iteration (HSVI). However, this method is restricted to small games. We address this limitation by presenting a new method with similar approximation and convergence guarantees but improved scalability and flexibility, which we call SAB: Shapley iteration with Aggregated Beliefs. Our method aggregates the belief space into a finite set of representative beliefs and computes their values through Shapley iteration. It then approximates the value function of the POSG through interpolation from these values. We prove that SAB converges and provide a bound on its approximation error. Experiments across several benchmark games show that SAB matches the performance of HSVI on small game instances while also scaling to larger games. Moreover, we find that SAB is up to 79% faster than HSVI at obtaining a near-optimal approximation.