In regions where tuberculosis (TB) is a high burden disease, empirical anti-TB treatment is generally recommended. However, TB can mimic a number of other diseases such as lymphoma, leading to high rates of misdiagnosis. This paper therefore suggests the use of machine learning and natural language processing techniques in the differentiation between tuberculosis and lymphoma.To conduct this study, medical case reports were collected automatically and converted into word vectors, which were augmented by adding symptoms and biographical features extracted from the case reports. Different machine learning algorithms were applied to the collected data, which was comprised of 215 TB cases, 505 lymphoma cases and 207 "other" cases. Each algorithm was evaluated based on accuracy, precision and recall. With an accuracy of up to 97.3%, and both precision and recall scores of up to 96%, logistic regression achieved best across datasets and metrics, although performing better on the augmented dataset.
CITATION STYLE
Pholo, M. Di., Hamam, Y., Khalaf, A., & Du, C. (2019). Combining TD-IDF with symptom features to differentiate between lymphoma and tuberculosis case reports. In GlobalSIP 2019 - 7th IEEE Global Conference on Signal and Information Processing, Proceedings. Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/GlobalSIP45357.2019.8969317
Mendeley helps you to discover research relevant for your work.