Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Branch-and-Bound (B\&B) is the dominant exact solution method for Mixed Integer Linear Programs (MILP), yet its exponential time complexity poses significant challenges for large-scale instances. The growing capabilities of machine learning have spurred efforts to enhance B\&B by learning data-driven branching policies. However, most existing approaches rely on imitation learning, which tends to overfit to expert demonstrations and struggles to generalize to structurally diverse or unseen instances. In this work, we propose TGPPO, a novel framework that employs Proximal Policy Optimization---a reinforcement learning algorithm---to train a branching policy aimed at improving generalization across heterogeneous MILP instances. Our approach builds on a parameterized state space representation that dynamically captures the evolving context of the search tree. Empirical evaluations show that TGPPO often outperforms existing learning-based policies in terms of reducing the number of nodes explored and improving primal-dual integrals, particularly on out-of-distribution instances. These results highlight the potential of reinforcement learning to develop robust and adaptable branching strategies for MILP solvers.
