Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
This paper defines and explores the design space for information extraction (IE) from layout-rich documents using large language models (LLMs). The three core challenges of layout-aware IE with LLMs are 1) data structuring, 2) model engagement, and 3) output refinement. Our study investigates the sub-problems within these core challenges, such as input representation, chunking, prompting, selection of LLMs, and multimodal models. It examines the effect of different design choices through a new layout-aware IE test suite, benchmarking against traditional, fine-tuned IE models. Our results on two datasets show that our one-factor-at-a-time (OFAT) method achieves near-optimal results. It is only 0.8--1.8 points lower than the best full factorial exploration with a fraction ~2.8 of the required computation. Compared to a baseline configuration, it gains 13.3--37.5 points. We demonstrate that, if well-configured, general-purpose LLMs match the performance of specialized models, providing a cost-effective, label-free alternative.