PCA and K-Means decipher genome

Alexander N. Gorban; Andrei Y. Zinovyev

Conference Proceedings

PCA and K-Means decipher genome

Lecture Notes in Computational Science and Engineering (2008) 58 309-323

DOI: 10.1007/978-3-540-73750-6_14

2Citations

49Readers

Get full text

Abstract

In this paper, we aim to give a tutorial for undergraduate students studying statistical methods and/or bioinformatics. The students will learn how data visualization can help in genomic sequence analysis. Students start with a fragment of genetic text of a bacterial genome and analyze its structure. By means of principal component analysis they "discover" that the information in the genome is encoded by non-overlapping triplets. Next, they learn how to find gene positions. This exercise on PCA and K-Means clustering enables active study of the basic bioinformatics notions. The Appendix contains program listings that go along with this exersice.

Author supplied keywords

Cite

CITATION STYLE

APA

Gorban, A. N., & Zinovyev, A. Y. (2008). PCA and K-Means decipher genome. In Lecture Notes in Computational Science and Engineering (Vol. 58, pp. 309–323). https://doi.org/10.1007/978-3-540-73750-6_14

PCA and K-Means decipher genome

Abstract

Author supplied keywords

Cite

Register to see more suggestions