Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background

poster

ACL 2024

August 22, 2024

Bangkok, Thailand

Assessing In-context Learning and Fine-tuning for Topic Classification of German Web Data

keywords:

german web data

topic classification

in-context learning

Researchers in political and social sciences often rely on classification models to analyze trends in information consumption by examining browsing histories of millions of webpages. Automated scalable methods are necessary due to the impracticality of manual labeling. In this paper, we model the detection of policy-related content as binary classification task and compare the accuracy of fine-tuned pre-trained encoder models against in-context learning strategies. Using only a few hundred annotated data points per topic, we detect content related to three German policies in a database of scraped webpages. We compare multilingual and monolingual models, as well as zero and few-shot approaches, and investigate the impact of negative sampling strategies and the combination of URL and content-based features. Our research shows that a small sample of annotated data is sufficient to train an effective classifier. Fine-tuning encoder-based models yields better results than in-context learning. Classifiers using both URL and content-based features perform best, while URL alone provides adequate accuracy when content is unavailable.

Next from ACL 2024

CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation
poster

CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation

ACL 2024

+9Pei Ke
Pei Ke and 11 other authors

22 August 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved