Thailand

In this paper, we present our approach to ad- dressing the binary classification tasks, Tasks 5 and 6, as part of the Social Media Mining for Health (SMM4H) text classification challenge. Both tasks involved working with imbalanced datasets that featured a scarcity of positive ex- amples. To mitigate this imbalance, we em- ployed a Large Language Model to generate synthetic texts with positive labels, aiming to augment the training data for our text classifi- cation models. Unfortunately, this method did not significantly improve model performance. Through clustering analysis using text embed- dings, we discovered that the generated texts significantly lacked diversity compared to the raw data. This finding highlights the challenges of using synthetic text generation for enhanc- ing model efficacy in real-world applications, specifically in the context of health-related so- cial media data.

ACL 2024

UTRad-NLP at #SMM4H 2024: Why LLM-Generated Texts Fail to Improve Text Classification Models

natural language processing:large language model:text classification:data augmentation:synthetic data

workshop paper

### Welcome!
The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) will take place in Bangkok, Thailand from August 11th to 16th, 2024. Our Virtual Poster Sessions will take place online Thursday, August 22, 2024.

You are required to register for this event. **Please register [here](https://2024.aclweb.org/registration). **

If you have already registered, please check your inbox for an email from Underline granting you access to ACL 2024 content.

Please register!

The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) will take place in Bangkok, Thailand from August 11th to 16th, 2024. More information will be announced soon.

This document describes our system used for the Social Media Mining for Health (SMM4H) 2024 Task 05. The objective of this task was to perform binary classification on the tweets provided in the dataset. The dataset contained two categories of tweets: those reporting medi- cal disorders and those merely mentioning the disease. We tackled this problem using a 5-fold cross-validation approach. Our method utilizes the RoBERTa-Large model with 5-fold cross- validation. The evaluation results yielded an F1-score of 0.886 on the validation dataset and 0.823 on the test dataset.

PheonixTrio918 at SMM4H 2024: 5 Fold Cross Validation for Classification of tweets reporting children’s disorders

We present our approach to solving the task of identifying the effect of outdoor activities on social anxiety based on reddit posts. We employed state-of-the-art transformer models enhanced with a combination of advanced loss functions. Data augmentation techniques were also used to address class imbalance within the training set. Our method achieved a macro- averaged F1 score of 0.655 in the test data, exceeding the mean F1 score of the shared task of 0.519. These findings suggest that integrat- ing weighted loss functions improves the per- formance of transformer models in classifying unbalanced text data, while data augmentation can improve the model’s ability to generalize.

PCIC at SMM4H 2024: Enhancing Reddit Post Classification on Social Anxiety Using Transformer Models and Advanced Loss Functions

With the widespread increase in the use of so- cial media platforms such as Twitter, Instagram, and Reddit, people are sharing their views on various topics. They have become more vocal on these platforms about their views and opin- ions on the medical challenges they are facing. This data is a valuable asset of medical insights in the study and research of healthcare. This paper describes our adoption of transformer- based approaches for tasks 3 and 5. For both tasks, we fine-tuned large RoBERTa, a BERT- based architecture, and achieved an F1 score of 0.413 and 0.900 in tasks 3 and 5, respectively.

Transformers at #SMM4H 2024: Identification of Tweets Reporting Children’s Medical Disorders And Effects of Outdoor Spaces on Social Anxiety Symptoms on Reddit Using RoBERTa

This paper presents our approach for SMM4H 2024 Task 5, focusing on identifying tweets where users discuss their child’s health con- ditions of ADHD, ASD, delayed speech, or asthma. Our approach uses a pipeline that com- bines transformer-based classifiers and GPT-4 large language models (LLMs). We first ad- dress data imbalance in the training set using topic modelling and under-sampling. Next, we train RoBERTa-based classifiers on the ad- justed data. Finally, GPT-4 refines the clas- sifier’s predictions for uncertain cases (confi- dence below 0.9). This strategy achieved signif- icant improvement over the baseline RoBERTa models. Our work demonstrates the effective- ness of combining transformer classifiers and LLMs for extracting health insights from social media conversations.

CHAAI@SMM4H’24: Enhancing Social Media Health Prediction Certainty by Integrating Large Language Models with Transformer Classifiers

This is the demonstration of systems and results of our team’s participation in the Social Medi- cal Mining for Health (SMM4H) 2024 Shared Task. Our team participated in two tasks: Task 1 and Task 5. Task 5 requires the detection of tweet sentences that claim children’s medi- cal disorders from certain users. Task 1 needs teams to extract and normalize Adverse Drug Event terms in the tweet sentence. The team selected several Pre-trained Language Models and generative Large Language Models to meet the requirements. Strategies to improve the per- formance include cloze test, prompt engineer- ing, Low Rank Adaptation etc. The test result of our system has an F1 score of 0.935, Preci- sion of 0.954 and Recall of 0.917 in Task 5 and an overall F1 score of 0.08 in Task 1.

PolyuCBS at SMM4H 2024: LLM-based Medical Disorder and Adverse Drug Event Detection with Low-rank Adaptation

The advent of Large Language Models (LLMs) such as Generative Pre-trained Transformers (GPT-4) mark a transformative era in Natu- ral Language Generation (NLG). These mod- els demonstrate the ability to generate coher- ent text that closely resembles human-authored content. They are easily accessible and have become invaluable tools in handling various text-based tasks, such as data annotation, report generation, and question answering. In this pa- per, we investigate GPT-4’s ability to discern between data it has annotated and data anno- tated by humans, specifically within the context of tweets in the medical domain. Through ex- perimental analysis, we observe GPT-4 outper- form other state-of-the-art models. The dataset used in this study was provided by the SMM4H (Social Media Mining for Health Research and Applications) shared task. Our model achieved an accuracy of 0.51, securing a second rank in the shared task.

Deloitte at #SMM4H 2024: Can GPT-4 Detect COVID-19 Tweets Annotated by Itself?

Many individuals affected by Social Anxiety Disorder turn to social media platforms to share their experiences and seek advice. This in- cludes discussing the potential benefits of en- gaging with outdoor environments. As part of #SMM4H 2024, Shared Task 3 focuses on classifying the effects of outdoor spaces on social anxiety symptoms in Reddit posts. In our contribution to the task, we explore the ef- fectiveness of domain-specific models (trained on social media data – SocBERT) against general domain models (trained on diverse datasets – BERT, RoBERTa, GPT-3.5) in pre- dicting the sentiment related to outdoor spaces. Further, we assess the benefits of augment- ing sparse human-labeled data with synthetic training instances and evaluate the complemen- tary strengths of domain-specific and general classifiers using an ensemble model. Our re- sults show that (1) fine-tuning small, domain- specific models generally outperforms large general language models in most cases. Only one large language model (GPT-4) exhibits per- formance comparable to the fine-tuned mod- els (52% F1). Further, we find that (2) syn- thetic data does improve the performance of fine-tuned models in some cases, and (3) mod- els do not appear to complement each other in our ensemble setup.

IMS_medicALY at #SMM4H 2024: Detecting Impacts of Outdoor Spaces on Social Anxiety with Data Augmented

Social Anxiety Disorder (SAD) is a common condition, affecting a significant portion of the population. While research suggests spending time in nature can alleviate anxiety, the specific impact on SAD remains unclear. This study ex- plores the relationship between discussions of outdoor spaces and social anxiety on social me- dia. We leverage transformer-based and large language models (LLMs) to analyze a social media dataset focused on SAD. We developed three methods for the task of predicting the effects of outdoor spaces on SAD in social me- dia. A two-stage pipeline classifier achieved the best performance of our submissions with results exceeding baseline performance.

LAMA at SMM4H 2024: Experimenting with Transformer-based and Large Language Models for Classifying Effects of Outdoor Spaces on Social Anxiety in Social Media Data

The paper presents two distinct approaches to Task 6 of the SMM4H’24 workshop: extracting self-reported exact age information from social media posts across platforms. This research task focuses on developing methods for au- tomatically extracting self-reported ages from posts on two prominent social media platforms: Twitter (now X) and Reddit. The work lever- ages two ways, one Mistral-7B-Instruct-v0.2 Large Language Model (LLM) and another pre- trained language model BERTweet, to achieve robust and generalizable age classification, sur- passing limitations of existing methods that rely on predefined age groups. The proposed mod- els aim to advance the automatic extraction of self-reported exact ages from social media posts, enabling more nuanced analyses and in- sights into user demographics across different platforms.

SMM4H’24 Task6 : Extracting Self-Reported Age with LLM and BERTweet: Fine-Grained Approaches for Social Media Text

This paper evaluates the performance of "AAST-NLP" in the Social Media Mining for Health (SMM4H) Shared Tasks 3 and 6, where more than 20 teams participated in each. We leveraged state-of-the-art transformer-based models, including Mistral, to achieve our re- sults. Our models consistently outperformed both the mean and median scores across the tasks. Specifically, an F1-score of 0.636 was achieved in classifying the impact of outdoor spaces on social anxiety symptoms, while an F1-score of 0.946 was recorded for the classifi- cation of self-reported exact ages

Premium content

UTRad-NLP at #SMM4H 2024: Why LLM-Generated Texts Fail to Improve Text Classification Models

Downloads

Next from ACL 2024

PheonixTrio918 at SMM4H 2024: 5 Fold Cross Validation for Classification of tweets reporting children’s disorders

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES