Identifying tweets of personal health experience through word embedding and LSTM neural network

36Citations
Citations of this article
71Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: As Twitter has become an active data source for health surveillance research, it is important that efficient and effective methods are developed to identify tweets related to personal health experience. Conventional classification algorithms rely on features engineered by human domain experts, and engineering such features is a challenging task and requires much human intelligence. The resultant features may not be optimal for the classification problem, and can make it challenging for conventional classifiers to correctly predict personal experience tweets (PETs) due to the various ways to express and/or describe personal experience in tweets. In this study, we developed a method that combines word embedding and long short-term memory (LSTM) model without the need to engineer any specific features. Through word embedding, tweet texts were represented as dense vectors which in turn were fed to the LSTM neural network as sequences. Results: Statistical analyses of the results of 10-fold cross-validations of our method and conventional methods indicate that there exist significant differences (p<0.01) in performance measures of accuracy, precision, recall, F1-score, and ROC/AUC, demonstrating that our approach outperforms the conventional methods in identifying PETs. Conclusion: We presented an efficient and effective method of identifying health-related personal experience tweets by combining word embedding and an LSTM neural network. It is conceivable that our method can help accelerate and scale up analyzing textual data of social media for health surveillance purposes, because of no need for the laborious and costly process of engineering features.

Cite

CITATION STYLE

APA

Jiang, K., Feng, S., Song, Q., Calix, R. A., Gupta, M., & Bernard, G. R. (2018). Identifying tweets of personal health experience through word embedding and LSTM neural network. BMC Bioinformatics, 19. https://doi.org/10.1186/s12859-018-2198-y

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free