Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Retrieval-augmented generation (RAG) with foundation models has demonstrated notable capabilities across various tasks and domains, showcasing advanced problem-solving skills in natural language understanding, generation, and complex decision-making processes. However, their proficiency in addressing expert-level reasoning, such as solving mathematical physics problems, remains relatively unexplored. This paper investigates the potential of RAG techniques to solve Olympic-level mathematical physics problems, motivated by students' natural inclination to reference past problems when preparing for competitions. We propose PhOPile, a high-quality, multimodal, physics-specific dataset tailored for Olympic-level challenges. In this dataset, problems from 2021 serve as the evaluation set, while data from other years form the knowledge base for RAG. Additionally, we benchmark RAG methods with foundation models to report and analyze results that can inform future research. Additionally, we highlight the significant role of reflection through experimental results. Furthermore, we conduct a high-granularity evaluation of the performance of popular LLMs and large multimodal models (LMMs) on our dataset, assessing their physics reasoning and RAG capabilities.