A novel algorithm for automatic species identification using principal component analysis

1Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This paper describes a novel scheme for automatic identification of a species from its genomic data. Random samples of a given length (10,000 elements) are taken from a genome sequence of a particular species. A set of 64 keywords is generated using all possible 3-tuple combinations of the 4 letters: A (for Adenine), T (for Thymine), C (for Cytosine) and G (for Guanine) representing the four types of nucleotide bases in a DNA strand. These 4 3 = 64 keywords are searched in a sample of the genome sequence and their corresponding frequencies of occurrence are determined. Upon repeating this process for N randomly selected samples taken from the genome sequence, an N × 64 matrix of frequency count data is obtained. Then Principal Component Analysis is employed on this data to obtain a Feature Descriptor of reduced dimension (1 × 64). On determining the feature descriptors of different species and also by taking different samples from the same species, it is found that they are unique for a particular species while wide differences exist between those of different species. The variance of the descriptors for a given genome sequence being negligible, the proposed scheme finds extensive applications in automatic species identification. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Sen, S., Narasimhan, S., & Konar, A. (2005). A novel algorithm for automatic species identification using principal component analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3776 LNCS, pp. 605–610). Springer Verlag. https://doi.org/10.1007/11590316_96

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free