Statistical Methods in Proteomics

Weichuan Yu; Baolin Wu; Tao Huang; Xiaoye Li; Kenneth Williams; Hongyu Zhao

Book Chapter

Statistical Methods in Proteomics

Springer, (2006), 623-638

DOI: 10.1007/978-1-84628-288-1_34

14Citations

21Readers

Get full text

Abstract

Proteomics technologies are rapidly evolving and attracting great attention in the post-genome era. In this chapter, we review two key applications of proteomics techniques: disease biomarker discovery and protein/peptide identification. For each of the applications, we state the major issues related to statistical modeling and analysis, review related work, discuss their strengths and weaknesses, and point out unsolved problems for future research. We organize this chapter as follows. Section 34.1 briefly introduces mass spectrometry (MS) and tandem MS/MS with a few sample plots showing the data format. Section 34.2 focuses on MS data preprocessing. We first review approaches in peak identification and then address the problem of peak alignment. After that, we point out unsolved problems and propose a few possible solutions. Section 34.3 addresses the issue of feature selection. We start with a simple example showing the effect of a large number of features. Then we address the interaction of different features and discuss methods of reducing the influence of noise. We finish this section with some discussion on the application of machine learning methods in feature selection. Section 34.4 addresses the problem of sample classification. We describe the random forest method in detail in Sect. 34.5. In Sect. 34.6 we address protein/peptide identification. We first review database searching methods in Sect. 34.6.1 and then focus on de novo MS/MS sequencing in Sect. 34.6.2. After reviewing major protein/peptide identification programs like SEQUEST and MASCOT in Sect. 34.6.3, we conclude the section by pointing out some major issues that need to be addressed in protein/peptide identification. Proteomics technologies are considered the major player in the analysis and understanding of protein function and biological pathways. The development of statistical methods and software for proteomics data analysis will continue to be the focus of proteomics for years to come.

Author supplied keywords

Cite

CITATION STYLE

APA

Yu, W., Wu, B., Huang, T., Li, X., Williams, K., & Zhao, H. (2006). Statistical Methods in Proteomics. In Springer Handbooks (pp. 623–638). Springer. https://doi.org/10.1007/978-1-84628-288-1_34

Statistical Methods in Proteomics

Abstract

Author supplied keywords

Cite

Register to see more suggestions