In this paper, we aim to give a tutorial for undergraduate students studying statistical methods and/or bioinformatics. The students will learn how data visualization can help in genomic sequence analysis. Students start with a fragment of genetic text of a bacterial genome and analyze its structure. By means of principal component analysis they "discover" that the information in the genome is encoded by non-overlapping triplets. Next, they learn how to find gene positions. This exercise on PCA and K-Means clustering enables active study of the basic bioinformatics notions. The Appendix contains program listings that go along with this exersice.
CITATION STYLE
Gorban, A. N., & Zinovyev, A. Y. (2008). PCA and K-Means decipher genome. In Lecture Notes in Computational Science and Engineering (Vol. 58, pp. 309–323). https://doi.org/10.1007/978-3-540-73750-6_14
Mendeley helps you to discover research relevant for your work.