Abstract
First-hand experience related to any changes of one’s health condition and understanding such experience can play an important role in advancing medical science and healthcare. Monitoring the safe use of medication drugs is an important task of pharmacovigilance, and first-hand experience of effects about consumers’ medication intake can be valuable to gain insight into how our human body reacts to medications. Social media have been considered as a possible alternative data source for gathering personal experience with medications posted by users. Identifying personal experience tweets is a challenging classification task, and efforts have been made to tackle the challenges using supervised approaches requiring annotated data. There exists an abundance of unlabeled Twitter data, and being able to use such data for training without suffering in classification performance is of great value, which can reduce the cost of laborious annotation process. We investigated two semi-supervised learning methods, with different mixes of labeled and unlabeled data in the training set, to understand the impact on classification performance. Our results from both pseudo-label and consistency regularization methods show that both methods generated a noticeable improvement in F1 score when the labeled set was small, and consistency regularization could still provide a small gain even a larger labeled set was used.
Cite
CITATION STYLE
Zhu, M., & Jiang, K. (2021). Semi-Supervised Language Models for Identification of Personal Health Experiential from Twitter Data: A Case for Medication Effects. In Proceedings of the 20th Workshop on Biomedical Language Processing, BioNLP 2021 (pp. 228–237). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.bionlp-1.25
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.