Efficient transformation of protein sequence databases to columnar index schema

Roman Zoun; Kay Schallert; David Broneske; Ivayla Trifonova; Xiao Chen; Robert Heyer; Dirk Benndorf; Gunter Saake

Conference Proceedings

Efficient transformation of protein sequence databases to columnar index schema

Communications in Computer and Information Science (2019) 1062 67-72

DOI: 10.1007/978-3-030-27684-3_10

1Citations

5Readers

Get full text

Abstract

Mass spectrometry is used to sequence proteins and extract bio-markers of biological environments. These bio-markers can be used to diagnose thousands of diseases and optimize biological environments such as bio-gas plants. Indexing of the protein sequence data allows to streamline the experiments and speed up the analysis. In our work, we present a schema for distributed column-based database management systems using a column-oriented index to store sequence data. This leads to the problem, how to transform the protein sequence data from the standard format to the new schema. We analyze four different methods of transformation and evaluate those four different methods. The results show that our proposed extended radix tree has the best performance regarding memory consumption and calculation time. Hence, the radix tree is proved to be a suitable data structure for the transformation of protein sequences into the indexed schema.

Author supplied keywords

Cite

CITATION STYLE

APA

Zoun, R., Schallert, K., Broneske, D., Trifonova, I., Chen, X., Heyer, R., … Saake, G. (2019). Efficient transformation of protein sequence databases to columnar index schema. In Communications in Computer and Information Science (Vol. 1062, pp. 67–72). Springer Verlag. https://doi.org/10.1007/978-3-030-27684-3_10

Efficient transformation of protein sequence databases to columnar index schema

Abstract

Author supplied keywords

Cite

Register to see more suggestions