
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

workshop paper
KUL@SMM4H2024: Optimizing Text Classification with Quality-Assured Augmentation Strategies
keywords:
text cclassification
regularised dropout
lm data augmentation
This paper presents our models for the Social Media Mining for Health 2024 shared task, specifically Task 5, which involves classifying tweets reporting a child with childhood dis- orders (annotated as "1") versus those merely mentioning a disorder (annotated as "0"). We utilized a classification model enhanced with diverse textual and language model-based aug- mentations. To ensure quality, we used seman- tic similarity, perplexity, and lexical diversity as evaluation metrics. Combining supervised con- trastive learning and cross-entropy-based learn- ing, our best model, incorporating R-drop and various LM generation-based augmentations, achieved an impressive F1 score of 0.9230 on the test set, surpassing the task mean and me- dian scores.