Thailand

The paper presents two distinct approaches to Task 6 of the SMM4H’24 workshop: extracting self-reported exact age information from social media posts across platforms. This research task focuses on developing methods for au- tomatically extracting self-reported ages from posts on two prominent social media platforms: Twitter (now X) and Reddit. The work lever- ages two ways, one Mistral-7B-Instruct-v0.2 Large Language Model (LLM) and another pre- trained language model BERTweet, to achieve robust and generalizable age classification, sur- passing limitations of existing methods that rely on predefined age groups. The proposed mod- els aim to advance the automatic extraction of self-reported exact ages from social media posts, enabling more nuanced analyses and in- sights into user demographics across different platforms.

ACL 2024

SMM4H’24 Task6 : Extracting Self-Reported Age with LLM and BERTweet: Fine-Grained Approaches for Social Media Text

age extraction

hugging face

bertweet

social media

workshop paper

### Welcome!
The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) will take place in Bangkok, Thailand from August 11th to 16th, 2024. Our Virtual Poster Sessions will take place online Thursday, August 22, 2024.

You are required to register for this event. **Please register [here](https://2024.aclweb.org/registration). **

If you have already registered, please check your inbox for an email from Underline granting you access to ACL 2024 content.

Please register!

The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) will take place in Bangkok, Thailand from August 11th to 16th, 2024. More information will be announced soon.

This paper evaluates the performance of "AAST-NLP" in the Social Media Mining for Health (SMM4H) Shared Tasks 3 and 6, where more than 20 teams participated in each. We leveraged state-of-the-art transformer-based models, including Mistral, to achieve our re- sults. Our models consistently outperformed both the mean and median scores across the tasks. Specifically, an F1-score of 0.636 was achieved in classifying the impact of outdoor spaces on social anxiety symptoms, while an F1-score of 0.946 was recorded for the classifi- cation of self-reported exact ages

AAST-NLP@#SMM4H’24: Finetuning Language Models for Exact Age Classification and Effect of Outdoor Spaces on Social Anxiety

This paper presents our work for the Task 5 of the Social Media Mining for Health Applica- tions 2024 Shared Task - Binary classification of English tweets reporting children’s medical disorders. In this paper, we present and com- pare multiple approaches for automatically clas- sifying tweets from parents based on whether they mention having a child with attention- deficit/hyperactivity disorder (ADHD), autism spectrum disorders (ASD), delayed speech, or asthma. We use ensemble of various BERT- based models trained on provided dataset that yields an F1 score of 0.901 on the test data.

CogAI@SMM4H 2024: Leveraging BERT-based Ensemble Models for Classifying Tweets on Developmental Disorders

This study describes the approach of Team ADE Oracle for Task 1 of the Social Media Mining for Health Applications (#SMM4H) 2024 shared task. Task 1 challenges partic- ipants to detect adverse drug events (ADEs) within English tweets and normalize these men- tions against the Medical Dictionary for Regu- latory Activities standards. Our approach uti- lized a two-stage NLP pipeline consisting of a named entity recognition model, retrained to recognize ADEs, followed by vector similar- ity assessment with a RoBERTa-based model. Despite achieving a relatively high recall of 37.4% in the extraction of ADEs, indicative of effective identification of potential ADEs, our model encountered challenges with preci- sion. We found marked discrepancies between recall and precision between the test set and our validation set, which underscores the need for further efforts to prevent overfitting and en- hance the model’s generalization capabilities for practical applications.

ADE Oracle at #SMM4H 2024: A Two-Stage NLP System for Extracting and Normalizing Adverse Drug Events from Tweets

The proliferation of LLMs in various NLP tasks has sparked debates regarding their reliability, particularly in annotation tasks where biases and hallucinations may arise. In this shared task, we address the challenge of distinguishing annotations made by LLMs from those made by human domain experts in the context of COVID-19 symptom detection from tweets in Latin American Spanish. This paper presents BrainStorm @ iREL’s approach to the SMM4H 2024 Shared Task, leveraging the inherent topi- cal information in tweets, we propose a novel approach to identify and classify annotations, aiming to enhance the trustworthiness of anno- tated data.

BrainStorm @ iREL at #SMM4H 2024: Leveraging Translation and Topical Embeddings for Annotation Detection in Tweets

We describe the methods and results of our submission to the 9th Social Media Mining for Health Research and Applications (SMM4H) 2024 shared tasks 4 and 5. Task 4 involved extracting the clinical and social impacts of non-medical substance use and task 5 focused on the binary classification of tweets reporting children’s medical disorders. We employed encoder language models and their ensembles, achieving the top score on task 4 and a high score for task 5.

UKYNLP@SMM4H2024: Language Model Methods for Health Entity Tagging and Classification on Social Media (Tasks 4 & 5)

Adverse drug events (ADEs) pose major pub- lic health risks, with traditional reporting sys- tems often failing to capture them. Our pro- posed pipeline, called Deep-LLMADEminer, used natural language processing approaches to tackle this issue for #SMM4H 2024 shared task 1. Using annotated tweets, we built a three part pipeline: RoBERTa for classification, GPT- 4-turbo for span extraction, and BioBERT for normalization. Our models achieved F1-scores of 0.838, 0.306, and 0.354, respectively, of- fering a novel system for Task 1 and similar pharmacovigilance tasks.

LHS712_ADENotGood at #SMM4H 2024 Task 1: Deep-LLMADEminer: A deep learning and LLM pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter

This paper describes the work undertaken as part of the SMM4H-2024 shared task, specifi- cally Task 5, which involves the binary classifi- cation of English tweets reporting children’s medical disorders. The primary objective is to develop a system capable of automat- ically identifying tweets from users who re- port their pregnancy and mention children with specific medical conditions, such as attention- deficit/hyperactivity disorder (ADHD), autism spectrum disorders (ASD), delayed speech, or asthma, while distinguishing them from tweets that merely reference a disorder without much context. Our approach leverages advanced nat- ural language processing techniques and ma- chine learning algorithms to accurately classify the tweets. The system achieved an overall F1- score of 0.87, highlighting its robustness and effectiveness in addressing the classification challenge posed by this task.

HaleLab_NITK@SMM4H’24: Binary Classification of English Tweets reporting Children’s Medical Disorders

This paper explores the potential of social me- dia as a rich source of data for understanding public health trends and behaviors, particularly focusing on emotional well-being and the im- pact of environmental factors. We employed large language models (LLMs) and developed a suite of knowledge extension techniques to analyze social media content related to men- tal health issues, specifically examining 1) ef- fects of outdoor spaces on social anxiety symp- toms in Reddit, 2) tweets reporting children’s medical disorders, and 3) self-reported ages in posts of Twitter and Reddit. Our knowl- edge extension approach encompasses both su- pervised data (i.e., sample augmentation and cross-task fine-tuning) and unsupervised data (i.e., knowledge distillation and cross-task pre- training), tackling the inherent challenges of sample imbalance and informality of social media language. The effectiveness of our ap- proach is demonstrated by the superior perfor- mance across multiple tasks (i.e., Task 3, 5 and 6) at the SMM4H-2024. Notably, we achieved the best performance in all three tasks, under- scoring the utility of our models in real-world applications.

CTYUN-AI@SMM4H-2024: Knowledge Extension Makes Expert Models

This paper presents our models for the Social Media Mining for Health 2024 shared task, specifically Task 5, which involves classifying tweets reporting a child with childhood dis- orders (annotated as "1") versus those merely mentioning a disorder (annotated as "0"). We utilized a classification model enhanced with diverse textual and language model-based aug- mentations. To ensure quality, we used seman- tic similarity, perplexity, and lexical diversity as evaluation metrics. Combining supervised con- trastive learning and cross-entropy-based learn- ing, our best model, incorporating R-drop and various LM generation-based augmentations, achieved an impressive F1 score of 0.9230 on the test set, surpassing the task mean and me- dian scores.

KUL@SMM4H2024: Optimizing Text Classification with Quality-Assured Augmentation Strategies

This paper summarizes our participation in the Shared Task 4 of #SMM4H 2024. Task 4 was a named entity recognition (NER) task identify- ing clinical and social impacts of non-medical substance use in English Reddit posts. We em- ployed the Bidirectional Encoder Representa- tions from Transformers (BERT) model to com- plete this task. Our team achieved an F1-score of 0.892 on a validation set and a relaxed F1- score of 0.191 on the test set.

Downloads

Next from ACL 2024

AAST-NLP@#SMM4H’24: Finetuning Language Models for Exact Age Classification and Effect of Outdoor Spaces on Social Anxiety

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES