Content not yet available
This lecture has no active video or poster.
Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Scientific research articles, typically distributed in PDF format, contain valuable knowledge but remain challenging to convert into structured datasets due to fragmented workflows that separate parsing, annotation, and visualization. Existing annotation platforms operate on plain text, which requires an additional PDF-to-text conversion step before annotation, while PDF parsing tools lack automated annotation suggestions. To bridge this gap, we introduce Docora, a system that unifies PDF parsing, automated annotation assistance, and multi-view visualization into a single interactive platform. Docora enables researchers to configure entity and relation schemas for any domain, automatically generates initial annotations using rule-based, model-based, or LLM-based extractors, and provides synchronized visualizations across PDF, text, and graph views. Users can refine annotations directly on the PDF canvas, ensuring consistency between document layout and structured representations. The system’s source code is publicly available to facilitate further research and development.