
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Olympiad-level benchmarks in mathematics and physics are crucial testbeds for advanced AI reasoning, but chem- istry, with its unique multimodal symbolic language, has remained an open challenge. We introduce ChemO, a new benchmark built from the International Chemistry Olympiad (IChO) 2025. ChemO features two key inno- vations for automated assessment: Assessment-Equivalent Reformulation (AER), which converts problems requiring visual outputs (e.g., drawing molecules) into computation- ally tractable formats, and Structured Visual Enhancement (SVE), a diagnostic mechanism to disentangle a model’s vi- sual perception capabilities from its core chemical reason- ing. To tackle this benchmark, we propose ChemLabs, a hierarchical multi-agent framework that mimics human ex- pert collaboration through specialized agents for problem decomposition, perception, reasoning, and auditing. Exper- iments on state-of-the-art multimodal models demonstrate that combining SVE with our multi-agent system yields dra- matic performance gains. Our top configuration achieves a score of 93.6 out of 100, surpassing an estimated hu- man gold medal threshold and establishing a new state- of-the-art in automated chemical problem-solving.
