Combining TD-IDF with symptom features to differentiate between lymphoma and tuberculosis case reports

5Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In regions where tuberculosis (TB) is a high burden disease, empirical anti-TB treatment is generally recommended. However, TB can mimic a number of other diseases such as lymphoma, leading to high rates of misdiagnosis. This paper therefore suggests the use of machine learning and natural language processing techniques in the differentiation between tuberculosis and lymphoma.To conduct this study, medical case reports were collected automatically and converted into word vectors, which were augmented by adding symptoms and biographical features extracted from the case reports. Different machine learning algorithms were applied to the collected data, which was comprised of 215 TB cases, 505 lymphoma cases and 207 "other" cases. Each algorithm was evaluated based on accuracy, precision and recall. With an accuracy of up to 97.3%, and both precision and recall scores of up to 96%, logistic regression achieved best across datasets and metrics, although performing better on the augmented dataset.

Cite

CITATION STYLE

APA

Pholo, M. Di., Hamam, Y., Khalaf, A., & Du, C. (2019). Combining TD-IDF with symptom features to differentiate between lymphoma and tuberculosis case reports. In GlobalSIP 2019 - 7th IEEE Global Conference on Signal and Information Processing, Proceedings. Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/GlobalSIP45357.2019.8969317

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free