RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification

97Citations
Citations of this article
183Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In order to determine the role of the database in taxonomic sequence classification, we examine the influence of the database over time on k-mer-based lowest common ancestor taxonomic classification. We present three major findings: the number of new species added to the NCBI RefSeq database greatly outpaces the number of new genera; as a result, more reads are classified with newer database versions, but fewer are classified at the species level; and Bayesian-based re-estimation mitigates this effect but struggles with novel genomes. These results suggest a need for new classification approaches specially adapted for large databases.

Cited by Powered by Scopus

The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest

2706Citations
N/AReaders
Get full text

RefSeq: Expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation

626Citations
N/AReaders
Get full text

Benchmarking Metagenomics Tools for Taxonomic Classification

333Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Nasko, D. J., Koren, S., Phillippy, A. M., & Treangen, T. J. (2018). RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification. Genome Biology, 19(1). https://doi.org/10.1186/s13059-018-1554-6

Readers over time

‘18‘19‘20‘21‘22‘23‘24‘25020406080

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 72

60%

Researcher 37

31%

Professor / Associate Prof. 10

8%

Lecturer / Post doc 1

1%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 51

42%

Biochemistry, Genetics and Molecular Bi... 50

41%

Computer Science 18

15%

Medicine and Dentistry 3

2%

Article Metrics

Tooltip
Mentions
Blog Mentions: 1
News Mentions: 4
Social Media
Shares, Likes & Comments: 37

Save time finding and organizing research with Mendeley

Sign up for free
0