
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
We study how to better use digitized historical archives to answer sociological and historical questions that require more context than raw text mentions provide. Using Finnish World War II Karelian evacuee family interviews, we build on prior extraction of 350K mentions of leisure activities and organizational memberships (71K unique names) that are too diverse and unstructured to analyze directly. We introduce a categorization framework capturing key dimensions of participation: type of activity/organization, typical sociality, regularity, and the level of physical demand. After creating a gold-standard annotated set, we evaluate whether large language models can apply the schema at scale and find that an open-weight LLM, combined with simple multi-run voting, closely matches expert judgments. We then label all 350K entities to produce a structured resource for downstream analyses of social integration and related outcomes.
