Clustering protein structures with hadoop

Giacomo Paschina; Luca Roverelli; Daniele D’Agostino; Federica Chiappori; Ivan Merelli

Conference Proceedings

Clustering protein structures with hadoop

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9874 LNCS 141-153

DOI: 10.1007/978-3-319-44332-4_11

1Citations

7Readers

Get full text

Abstract

Machine learning is a widely used technique in structural biology, since the analysis of large conformational ensembles originated from single protein structures (e.g. derived from NMR experiments or molecular dynamics simulations) can be approached by partitioning the original dataset into sensible subsets, revealing important structural and dynamics behaviours. Clustering is a good unsupervised approach for dealing with these ensembles of structures, in order to identify stable conformations and driving characteristics shared by the different structures. A common problem of the applications that implement protein clustering is the scalability of the performance, in particular concerning the data load into memory. In this work we show how it is possible to improve the parallel performance of the GROMOS clustering algorithm by using Hadoop. The preliminary results show the validity of this approach, providing a hint for future development in this field.

Author supplied keywords

Cite

CITATION STYLE

APA

Paschina, G., Roverelli, L., D’Agostino, D., Chiappori, F., & Merelli, I. (2016). Clustering protein structures with hadoop. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9874 LNCS, pp. 141–153). Springer Verlag. https://doi.org/10.1007/978-3-319-44332-4_11

Clustering protein structures with hadoop

Abstract

Author supplied keywords

Cite

Register to see more suggestions