Combining structured and free textual data of diabetic patients’ smoking status

Ivelina Nikolova; Svetla Boytcheva; Galia Angelova; Zhivko Angelov

Conference Proceedings

Combining structured and free textual data of diabetic patients’ smoking status

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9883 LNAI 57-67

DOI: 10.1007/978-3-319-44748-3_6

1Citations

15Readers

Get full text

Abstract

The main goal of this research is to identify and extract risk factors for Diabetes Mellitus. The data source for our experiments are 8 mln outpatient records from the Bulgarian Diabetes Registry submitted to the Bulgarian Health Insurance Fund by general practitioners and all kinds of professionals during 2014. In this paper we report our work on automatic identification of the patients’ smoking status. The experiments are performed on free text sections of a randomly extracted subset of the registry outpatient records. Although no rich semantic resources for Bulgarian exist, we were able to enrich our model with semantic features based on categorical vocabularies. In addition to the automatically labeled records we use the records form the Diabetes register that contain diagnoses related to tobacco usage. Finally, a combined result from structured information (ICD-10 codes) and extracted data about the smoking status is associated with each patient. The reported accuracy of the best model is comparable to the highest results reported at the i2b2 Challenge 2006. These method is ready to be validated on big data after minor improvements.

Author supplied keywords

Cite

CITATION STYLE

APA

Nikolova, I., Boytcheva, S., Angelova, G., & Angelov, Z. (2016). Combining structured and free textual data of diabetic patients’ smoking status. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9883 LNAI, pp. 57–67). Springer Verlag. https://doi.org/10.1007/978-3-319-44748-3_6

Combining structured and free textual data of diabetic patients’ smoking status

Abstract

Author supplied keywords

Cite

Register to see more suggestions