Evaluating the effect of annotation size on measures of semantic similarity

Maxat Kulmanov; Robert Hoehndorf

Journal ArticleOPEN ACCESS

Evaluating the effect of annotation size on measures of semantic similarity

Journal of Biomedical Semantics (2017) 8(1)

DOI: 10.1186/s13326-017-0119-z

21Citations

28Readers

Abstract

Background: Ontologies are widely used as metadata in biological and biomedical datasets. Measures of semantic similarity utilize ontologies to determine how similar two entities annotated with classes from ontologies are, and semantic similarity is increasingly applied in applications ranging from diagnosis of disease to investigation in gene networks and functions of gene products. Results: Here, we analyze a large number of semantic similarity measures and the sensitivity of similarity values to the number of annotations of entities, difference in annotation size and to the depth or specificity of annotation classes. We find that most similarity measures are sensitive to the number of annotations of entities, difference in annotation size as well as to the depth of annotation classes; well-studied and richly annotated entities will usually show higher similarity than entities with only few annotations even in the absence of any biological relation. Conclusions: Our findings may have significant impact on the interpretation of results that rely on measures of semantic similarity, and we demonstrate how the sensitivity to annotation size can lead to a bias when using semantic similarity to predict protein-protein interactions.

Author supplied keywords

Cite

CITATION STYLE

APA

Kulmanov, M., & Hoehndorf, R. (2017). Evaluating the effect of annotation size on measures of semantic similarity. Journal of Biomedical Semantics, 8(1). https://doi.org/10.1186/s13326-017-0119-z

Evaluating the effect of annotation size on measures of semantic similarity

Abstract

Author supplied keywords

Cite

Register to see more suggestions