Clustering the normalized compression distance for influenza virus data

Kimihito Ito; Thomas Zeugmann; Yu Zhu

Conference ProceedingsOPEN ACCESS

Clustering the normalized compression distance for influenza virus data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6060 LNCS 130-146

DOI: 10.1007/978-3-642-12476-1_9

6Citations

10Readers

Abstract

The present paper analyzes the usefulness of the normalized compression distance for the problem to cluster the hemagglutinin (HA) sequences of influenza virus data for the HA gene in dependence on the available compressors. Using the CompLearn Toolkit, the built-in compressors zlib and bzip2 are compared. Moreover, a comparison is made with respect to hierarchical and spectral clustering. For the hierarchical clustering, hclust from the R package is used, and the spectral clustering is done via the kLine algorithm proposed by Fischer and Poland (2004). Our results are very promising and show that one can obtain an (almost) perfect clustering. It turned out that the zlib compressor allowed for better results than the bzip2 compressor and, if all data are concerned, then hierarchical clustering is a bit better than spectral clustering via kLines. © Springer-Verlag Berlin Heidelberg 2010.

Cite

CITATION STYLE

APA

Ito, K., Zeugmann, T., & Zhu, Y. (2010). Clustering the normalized compression distance for influenza virus data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6060 LNCS, pp. 130–146). https://doi.org/10.1007/978-3-642-12476-1_9

Clustering the normalized compression distance for influenza virus data

Abstract

Cite

Register to see more suggestions