Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background
VIDEO DOI: https://doi.org/10.48448/1y96-7c19

workshop paper

ACL 2024

August 15, 2024

Bangkok, Thailand

"Gotta catch `em all!": Retrieving people in Ancient Greek texts combining transformer models and domain knowledge

keywords:

ncient greek

gazetteers

named entity recognition

transformers

In this paper, we present a study of Named Entity Recognition (NER) as applied to Ancient Greek texts, with an emphasis on identifying individuals. Recent research shows that, while the task remains difficult, the use of transformer models results in significant improvements. In the first part of the paper, we therefore compare the performance of four transformer models on the task of NER for the categories of people, locations and groups, and add an out-of-domain test set to the existing datasets. Results on this set highlight the shortcomings of the models when confronted with a random sample of sentences. Hence, in the second part of the paper, we narrow down our approach to the category of people, to be able to include domain knowledge. First, we simplify the task to a binary PERS/MISC classification on the token level, starting from capitalised words. Next, we test the use of domain- and linguistic knowledge to improve the results. We found that including simple gazetteer information as a binary mask has a marginally positive effect on newly annotated data and that treebanks can be used to help identify multi-word individuals if they are scarcely or inconsistently annotated in the available training data. We conclude with a qualitative error analysis that identifies further areas of improvement.

Downloads

Transcript English (automatic)

Next from ACL 2024

Predicate Sense Disambiguation for UMR Annotation of Latin: Challenges and Insights
workshop paper

Predicate Sense Disambiguation for UMR Annotation of Latin: Challenges and Insights

ACL 2024

Federica Gamba

15 August 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved