Combine unsupervised learning and heuristic rules to annotate organism morphological descriptions

4Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Biodiversity literature is a comprehensive compilation of information on living organisms and fossils. Rich factual information on characteristics of organisms is presented in narrative form, hence limiting its repurpose and reuse. Transforming narrative information into atomic forms has been of special concern to informatics researchers and biological researchers alike. Research done previously shows similar results but lacks a detailed, scientific evaluation that would help illuminate the problem and eventually lead to a higher performance approach. Due to the sublanguage nature of morphological descriptions, it is thought that general-purpose nature language processing (NLP) tools are not effective in this application. A heuristic-based approach has been suggested in the literature. In this paper, we report our experiments with such an approach, where a set of simple, intuitive heuristic rules, informed by results of an unsupervised learning algorithm, is used to segment taxonomic descriptions and identify the organs along with their associated character/value pairs (color=white, shape=ovoid). This model system allows us to investigate the character annotation problem further, study the characteristics of morphological descriptions, identify the areas where the system fails, and suggest ways to address those failures. One such suggestion is to make use of general-purpose syntactic parsers in a controlled manner.

References Powered by Scopus

Learning information extraction rules for semi-structured and free text

705Citations
N/AReaders
Get full text

Information extraction in molecular biology.

85Citations
N/AReaders
Get full text

Phenex: Ontological annotation of phenotypic diversity

73Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Applications of natural language processing in biodiversity science

64Citations
N/AReaders
Get full text

CharaParser for fine-grained semantic annotation of organism morphological descriptions

52Citations
N/AReaders
Get full text

Next-generation phenomics for the Tree of Life

42Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Cui, H., Singaram, S., & Janning, A. (2011). Combine unsupervised learning and heuristic rules to annotate organism morphological descriptions. In Proceedings of the ASIST Annual Meeting (Vol. 48). https://doi.org/10.1002/meet.2011.14504801031

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 3

43%

Researcher 3

43%

Professor / Associate Prof. 1

14%

Readers' Discipline

Tooltip

Computer Science 3

50%

Social Sciences 1

17%

Agricultural and Biological Sciences 1

17%

Environmental Science 1

17%

Save time finding and organizing research with Mendeley

Sign up for free