
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

workshop paper
"Gotta catch `em all!": Retrieving people in Ancient Greek texts combining transformer models and domain knowledge
keywords:
ncient greek
gazetteers
named entity recognition
transformers
In this paper, we present a study of Named Entity Recognition (NER) as applied to Ancient Greek texts, with an emphasis on identifying individuals. Recent research shows that, while the task remains difficult, the use of transformer models results in significant improvements. In the first part of the paper, we therefore compare the performance of four transformer models on the task of NER for the categories of people, locations and groups, and add an out-of-domain test set to the existing datasets. Results on this set highlight the shortcomings of the models when confronted with a random sample of sentences. Hence, in the second part of the paper, we narrow down our approach to the category of people, to be able to include domain knowledge. First, we simplify the task to a binary PERS/MISC classification on the token level, starting from capitalised words. Next, we test the use of domain- and linguistic knowledge to improve the results. We found that including simple gazetteer information as a binary mask has a marginally positive effect on newly annotated data and that treebanks can be used to help identify multi-word individuals if they are scarcely or inconsistently annotated in the available training data. We conclude with a qualitative error analysis that identifies further areas of improvement.