Content not yet available
This lecture has no active video or poster.
Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Mathematical Expression Recognition (MER) has made significant progress in recognizing the simple expression, but the robust recognition of the complex mathematical expression (CMER) with numerous tokens and multiple lines remains a formidable challenge. In this paper, we first introduce CMER-Bench, a carefully constructed benchmark that categorizes expressions into three difficulty levels: normal, moderate, and complex. Leveraging CMER-Bench, we conduct a comprehensive evaluation of existing expert MER models and general-purpose multimodal large language models (MLLMs). The results reveal that while current methods perform well on normal and moderate expressions, their performance degrades significantly when handling complex mathematical expressions. In response, and considering that existing public training datasets are primarily composed of simple samples, we propose CMER-17M, a large-scale dataset specifically designed for the recognition of complex mathematical expressions. This dataset provides rich and diverse samples to support the development of accurate and robust CMER models. Furthermore, to address the challenges posed by the spatial structure of complex expressions, we introduce a novel expression representation called SML, which explicitly models the hierarchical and spatial structure of mathematical content beyond \LaTeX{} format. Based on the SML representation, we propose a specialized model named CMERNet, built upon an encoder-decoder architecture and trained on CMER-17M. Experimental results show that CMERNet, with only 0.1 billion parameters, significantly outperforms all existing expert models and MLLMs on CMER-Bench, particularly in the complex level. All data and code will be released.