EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Legal citation detection in court judgments underpins reliable precedent mapping, citation analytics, and document retrieval. Extracting references to legislation and case law in the United Kingdom is especially challenging: citation styles have evolved over centuries, and judgments routinely cite foreign or historical authorities. We conduct the first systematic comparison of three modelling paradigms on this task using the Cambridge Law Corpus: (i) rule‑based regular expressions; (ii) transformer-based encoders (BERT, RoBERTa, LEGAL‑BERT, ModernBERT); and (iii) large language models (GPT‑4.1). We produced a gold‑standard high-quality corpus of 190 court judgments containing 45,179 fine-grained annotations for UK and non-UK legislation and case references. ModernBERT achieves a macro-averaged F1 of 93.3%, only marginally ahead of the other encoder-only models, yet significantly outperforming the strongest regular-expression baseline (35.42% F1) and GPT-4.1 (76.57% F1).

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Studying the Role of Input-Neighbor Overlap in Retrieval-Augmented Language Models Training Efficiency
poster

Studying the Role of Input-Neighbor Overlap in Retrieval-Augmented Language Models Training Efficiency

EMNLP 2025

Marco Kuhlmann
Ehsan Doostmohammadi and 1 other author

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved