Extracting relations from text: From word sequences to dependency paths

Razvan C. Bunescu; Raymond J. Mooney

Book ChapterOPEN ACCESS

Extracting relations from text: From word sequences to dependency paths

Springer London, (2007), 29-44

DOI: 10.1007/978-1-84628-754-1_3

23Citations

42Readers

Get full text

Abstract

Extracting semantic relationships between entities mentioned in text documents is an important task in natural language processing. The various types of relationships that are discovered between mentions of entities can provide useful structured information to a text mining system [1]. Traditionally, the task specifies a predefined set of entity types and relation types that are deemed to be relevant to a potential user and that are likely to occur in a particular text collection. For example, information extraction from newspaper articles is usually concerned with identifying mentions of people, organizations, locations, and extracting useful relations between them. Relevant relation types range from social relationships, to roles that people hold inside an organization, to relations between organizations, to physical locations of people and organizations. Scientific publications in the biomedical domain offer a type of narrative that is very different from the newspaper discourse. A significant effort is currently spent on automatically extracting relevant pieces of information from Medline, an online collection of biomedical abstracts. Proteins, genes, and cells are examples of relevant entities in this task, whereas subcellular localizations and protein-protein interactions are two of the relation types that have received significant attention recently. The inherent difficulty of the relation extraction task is further compounded in the biomedical domain by the relative scarcity of tools able to analyze the corresponding type of narrative. Most existing natural language processing tools, such as tokenizers, sentence segmenters, part-of-speech (POS) taggers, shallow or full parsers are trained on newspaper corpora, and consequently they inccur a loss in accuracy when applied to biomedical literature. Therefore, information extraction systems developed for biological corpora need to be robust to POS or parsing errors, or to give reasonable performance using shallower but more reliable information, such as chunking instead of full parsing. © 2007 Springer-Verlag London Limited.

Cite

CITATION STYLE

APA

Bunescu, R. C., & Mooney, R. J. (2007). Extracting relations from text: From word sequences to dependency paths. In Natural Language Processing and Text Mining (pp. 29–44). Springer London. https://doi.org/10.1007/978-1-84628-754-1_3

Extracting relations from text: From word sequences to dependency paths

Abstract

Cite

Register to see more suggestions