RHMD: A Real-World Dataset for Health Mention Classification on Reddit

4Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.
Get full text

Abstract

People on social media share their thoughts and experiences using diseases and symptoms words other than to mention their health, which can introduce biases in data-driven public health applications. For the advancement of HMC research, in this study, we present a Reddit health mention dataset (RHMD), a new dataset of multi-domain Reddit data for the HMC. RHMD is composed of 10015 manually annotated Reddit posts that include 15 common disease or symptom terms and are labeled with four labels: personal health mentions (HMs), nonpersonal HMs, figurative HMs, and hyperbolic HMs. Empirical evaluation using recently proposed methods demonstrates the challenge of labeling user-generated text across these four types. Contributions to this work include the public release of a robustly annotated Reddit dataset (RHMD) for HM tasks and a comprehensive performance analysis of baseline methods. We expect the release of the dataset, and the evaluations will help facilitate the development of new methods for detecting HMs in the user-generated text. The dataset is available at https://github.com/usmaann/RHMD-Health-Mention-Dataset.

Cite

CITATION STYLE

APA

Naseem, U., Khushi, M., Kim, J., & Dunn, A. G. (2023). RHMD: A Real-World Dataset for Health Mention Classification on Reddit. IEEE Transactions on Computational Social Systems, 10(5), 2325–2334. https://doi.org/10.1109/TCSS.2022.3186883

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free