Evaluating the effect of annotation size on measures of semantic similarity

19Citations
Citations of this article
28Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Ontologies are widely used as metadata in biological and biomedical datasets. Measures of semantic similarity utilize ontologies to determine how similar two entities annotated with classes from ontologies are, and semantic similarity is increasingly applied in applications ranging from diagnosis of disease to investigation in gene networks and functions of gene products. Results: Here, we analyze a large number of semantic similarity measures and the sensitivity of similarity values to the number of annotations of entities, difference in annotation size and to the depth or specificity of annotation classes. We find that most similarity measures are sensitive to the number of annotations of entities, difference in annotation size as well as to the depth of annotation classes; well-studied and richly annotated entities will usually show higher similarity than entities with only few annotations even in the absence of any biological relation. Conclusions: Our findings may have significant impact on the interpretation of results that rely on measures of semantic similarity, and we demonstrate how the sensitivity to annotation size can lead to a bias when using semantic similarity to predict protein-protein interactions.

Cite

CITATION STYLE

APA

Kulmanov, M., & Hoehndorf, R. (2017). Evaluating the effect of annotation size on measures of semantic similarity. Journal of Biomedical Semantics, 8(1). https://doi.org/10.1186/s13326-017-0119-z

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free