Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
We present Hexaïssa, a novel framework for adaptive chess engine routing that formulates expert selection as a Mixture-of-Experts (MoE) problem. Hexaïssa learns a gating policy that dynamically selects among heterogeneous state-of-the-art engines—such as Stockfish, LCZero, and Obsidian—based on the tactical and strategic complexity of each board state. This adaptivity enables stronger play and more efficient computation than any fixed engine or static configuration. However, training such a gating policy is fundamentally challenging due to sparse optimization signals and long-horizon credit assignment in chess games. To address these challenges, we introduce a score-based inverse reinforcement learning (IRL) method that models expert engine trajectories as samples from a latent distribution over optimal behaviors. By recovering the Stein score function of this distribution via stochastic differential equations (SDEs), we infer dense, per-move reward signals consistent with potential-based IRL. These latent rewards allow efficient training of the gating network without requiring additional environment interaction or human supervision. Empirical results on standard chess benchmarks demonstrate that Hexaïssa significantly outperforms individual engines, conventional MoE models, and IRL baselines.