Using text mining techniques to extract phenotypic information from the PhenoCHF corpus

30Citations
Citations of this article
75Readers
Mendeley users who have this article in their library.

Abstract

Background: Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. Methods: To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Results: Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. Conclusions: PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single disease, the promising results achieved can stimulate further work into the extraction of phenotypic information for other diseases. The PhenoCHF annotation guidelines and annotations are publicly available at https://code.google.com/p/phenochf-corpus.

References Powered by Scopus

An Introduction to Hidden Markov Models

3472Citations
N/AReaders
Get full text

The Unified Medical Language System (UMLS): Integrating biomedical terminology

3343Citations
N/AReaders
Get full text

Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders

2304Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations

130Citations
N/AReaders
Get full text

Comparison of MetaMap and cTAKES for entity extraction in clinical notes

63Citations
N/AReaders
Get full text

Text mining the history of medicine

47Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Alnazzawi, N., Thompson, P., Batista-Navarro, R., & Ananiadou, S. (2015). Using text mining techniques to extract phenotypic information from the PhenoCHF corpus. In BMC Medical Informatics and Decision Making (Vol. 15). BioMed Central Ltd. https://doi.org/10.1186/1472-6947-15-S2-S3

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 25

45%

Researcher 22

40%

Professor / Associate Prof. 6

11%

Lecturer / Post doc 2

4%

Readers' Discipline

Tooltip

Computer Science 22

55%

Medicine and Dentistry 7

18%

Agricultural and Biological Sciences 6

15%

Engineering 5

13%

Article Metrics

Tooltip
Mentions
News Mentions: 1

Save time finding and organizing research with Mendeley

Sign up for free