Extraction of chemical-induced diseases using prior knowledge and textual information

33Citations
Citations of this article
48Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We describe our approach to the chemical-disease relation (CDR) task in the BioCreative V challenge. The CDR task consists of two subtasks: Automatic disease-named entity recognition and normalization (DNER), and extraction of chemical-induced diseases (CIDs) from Medline abstracts. For the DNER subtask, we used our concept recognition tool Peregrine, in combination with several optimization steps. For the CID subtask, our system, which we named RELigator, was trained on a rich feature set, comprising features derived from a graph database containing prior knowledge about chemicals and diseases, and linguistic and statistical features derived from the abstracts in the CDR training corpus. We describe the systems that were developed and present evaluation results for both subtasks on the CDR test set. For DNER, our Peregrine system reached an F-score of 0.757. For CID, the system achieved an F-score of 0.526, which ranked second among 18 participating teams. Several post-challenge modifications of the systems resulted in substantially improved F-scores (0.828 for DNER and 0.602 for CID).

Figures

  • Figure 1. Workflow for CDR extraction. The chemical and disease entities in a Medline abstract are recognized and mapped to their corresponding MeSH identifiers by tmChem (for chemicals) and Peregrine (for diseases). For each possible combination of chemicals and diseases that are found in the document, features are generated based on prior knowledge from a knowledge platform, and based on statistical and linguistic information from the document. The features are fed to an SVM classifier to detect CIDs.
  • Table 1. Characteristics of the CDR corpus
  • Figure 2. Example dependency parse tree for a sentence about the chemical ‘acetaminophen’ and the disease ‘anaphylaxis’. The governing verb of the disease is ‘produce’; the governing verb of the chemical is ‘demonstrated’, which is also the relating word.
  • Table 2. Performance of the Peregrine challenge and postchallenge systems for disease normalization on the test set
  • Table 3. Error analysis of 50 false-positive and 50 false-negative errors of the post-challenge Peregrine system
  • Table 4. Performance of different relation extraction systems on the CDR training and development data, given perfect entity annotations
  • Table 5. Performance of relation extraction systems on the CDR test data, for different entity annotations

References Powered by Scopus

The Unified Medical Language System (UMLS): Integrating biomedical terminology

3328Citations
N/AReaders
Get full text

DNorm: Disease name normalization with pairwise learning to rank

386Citations
N/AReaders
Get full text

Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports

348Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Document-level attention-based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition

92Citations
N/AReaders
Get full text

Chemical-induced disease relation extraction via convolutional neural network

89Citations
N/AReaders
Get full text

Broad-coverage biomedical relation extraction with SemRep

66Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Pons, E., Becker, B. F. H., Akhondi, S. A., Afzal, Z., Van Mulligen, E. M., & Kors, J. A. (2016). Extraction of chemical-induced diseases using prior knowledge and textual information. Database, 2016. https://doi.org/10.1093/database/baw046

Readers over time

‘16‘17‘18‘19‘20‘21‘22‘24‘250481216

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 24

67%

Researcher 8

22%

Professor / Associate Prof. 2

6%

Lecturer / Post doc 2

6%

Readers' Discipline

Tooltip

Computer Science 18

62%

Biochemistry, Genetics and Molecular Bi... 5

17%

Engineering 3

10%

Medicine and Dentistry 3

10%

Save time finding and organizing research with Mendeley

Sign up for free
0