Horses to Zebras: Ontology-Guided Data Augmentation and Synthesis for ICD-9 Coding

10Citations
Citations of this article
40Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Medical document coding is the process of assigning labels from a structured label space (ontology - e.g., ICD-9) to medical documents. This process is laborious, costly, and error-prone. In recent years, efforts have been made to automate this process with neural models. The label spaces are large (in the order of thousands of labels) and follow a big-head long-tail label distribution, giving rise to few-shot and zero-shot scenarios. Previous efforts tried to address these scenarios within the model, leading to improvements on rare labels, but worse results on frequent ones. We propose data augmentation and synthesis techniques in order to address these scenarios. We further introduce an analysis technique for this setting inspired by confusion matrices. This analysis technique points to the positive impact of data augmentation and synthesis, but also highlights more general issues of confusion within families of codes, and underprediction.

Cite

CITATION STYLE

APA

Falis, M., Dong, H., Birch, A., & Alex, B. (2022). Horses to Zebras: Ontology-Guided Data Augmentation and Synthesis for ICD-9 Coding. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 389–401). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.bionlp-1.39

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free