Algorithms designed for compressed-gene-data transformation among gene banks with different references

3Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: With the reduction of gene sequencing cost and demand for emerging technologies such as precision medical treatment and deep learning in genome, it is an era of gene data outbreaks today. How to store, transmit and analyze these data has become a hotspot in the current research. Now the compression algorithm based on reference is widely used due to its high compression ratio. There exists a big problem that the data from different gene banks can't merge directly and share information efficiently, because these data are usually compressed with different references. The traditional workflow is decompression-and-recompression, which is too simple and time-consuming. We should improve it and speed it up. Results: In this paper, we focus on this problem and propose a set of transformation algorithms to cope with it. We will 1) analyze some different compression algorithms to find the similarities and the differences among all of them, 2) come up with a naïve method named TDM for data transformation between difference gene banks and finally 3) optimize former method TDM and propose the method named TPI and the method named TGI. A number of experiment result proved that the three algorithms we proposed are an order of magnitude faster than traditional decompression-and-recompression workflow. Conclusions: Firstly, the three algorithms we proposed all have good performance in terms of time. Secondly, they have their own different advantages faced with different dataset or situations. TDM and TPI are more suitable for small-scale gene data transformation, while TGI is more suitable for large-scale gene data transformation.

Cite

CITATION STYLE

APA

Luo, Q., Guo, C., Zhang, Y. J., Cai, Y., & Liu, G. (2018). Algorithms designed for compressed-gene-data transformation among gene banks with different references. BMC Bioinformatics, 19(1). https://doi.org/10.1186/s12859-018-2230-2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free