Linguistic and statistically derived features for cause of death prediction from verbal autopsy text

15Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Automatic Text Classification (ATC) is an emerging technology with economic importance given the unprecedented growth of text data. This paper reports on work in progress to develop methods for predicting Cause of Death from Verbal Autopsy (VA) documents recommended for use in low-income countries by the World Health Organisation. VA documents contain both coded data and open narrative. The task is formulated as a Text Classification problem and explores various combinations of linguistic and statistical approaches to determine how these may improve on the standard bag-of-words approach using a dataset of over 6400 VA documents that were manually annotated with cause of death. We demonstrate that a significant improvement of prediction accuracy can be obtained through a novel combination of statistical and linguistic features derived from the VA text. The paper explores the methods by which ATC may leads to improved accuracy in Cause of Death prediction. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Danso, S., Atwell, E., & Johnson, O. (2013). Linguistic and statistically derived features for cause of death prediction from verbal autopsy text. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8105 LNAI, pp. 47–60). https://doi.org/10.1007/978-3-642-40722-2_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free