We present a new algorithm for protein identification using tandem mass spectrometry and protein sequence databases. This algorithm uses a binomial probability as a preliminary scoring scheme to select candidate peptides for final scoring. The binomial probability scores generated by ProLuCID have no significant molecular weight bias and are independent of database size. The final scores are computed using a modified crosscorrelation function which models isotopic distributions of fragment ions of candidate peptides, which ultimately results in higher sensitivity and specificity than that obtained with SEQUEST. In addition, ProLuCID takes advantage of high resolution MS/MS which significantly improves specificity when compared to low resolution tandem MS data. Using DTASelect2 and a 5%false positive rate, ProLuCID can identify 1525% more proteins than SEQUEST. ProLuCID is designed to be faster than published algorithms in searching large MS/MS datasets. To reduce search times, we used an object oriented approach that improves computational efficiency and results in performance improvements of 200% as compared to SEQUEST. With high precursor mass accuracy and database preprocessing, the speed improvements approach 1,500%. ProLuCID is implemented in Java and it can be easily installed on a single computer or a computer cluster.
Xu, T., Venable, J. D., Park, S. K., Cociorva, D., Lu, B., Liao, L., … Yates III, J. R. (2006). ProLuCID, a fast and sensitive tandem mass spectra-based protein identification program. Molecular Cellular Proteomics, 5, S174.