Repeat-Preserving Decoy Database for False Discovery Rate Estimation in Peptide Identification

23Citations
Citations of this article
37Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The sequence database searching method is widely used in proteomics for peptide identification. To control the false discovery rate (FDR) of the searching results, the target-decoy method generates and searches a decoy database together with the target database. A known problem is that the target protein sequence database may contain numerous repeated peptides. The structures of these repeats are not preserved by most existing decoy generation algorithms. Previous studies suggest that such discrepancy between the target and decoy databases may lead to an inaccurate FDR estimation. Based on the de Bruijn graph model, we propose a new repeat-preserving algorithm to generate decoy databases. We prove that this algorithm preserves the structures of the repeats in the target database to a great extent. The de Bruijn method has been compared with a few other commonly used methods and demonstrated superior FDR estimation accuracy and an improved number of peptide identification.

Cite

CITATION STYLE

APA

Moosa, J. M., Guan, S., Moran, M. F., & Ma, B. (2020). Repeat-Preserving Decoy Database for False Discovery Rate Estimation in Peptide Identification. Journal of Proteome Research, 19(3), 1029–1036. https://doi.org/10.1021/acs.jproteome.9b00555

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free