StackZDPD: a novel encoding scheme for mass spectrometry data optimized for speed and compression ratio

5Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

As the pervasive, standardized format for interchange and deposition of raw mass spectrometry (MS) proteomics and metabolomics data, text-based mzML is inefficiently utilized on various analysis platforms due to its sheer volume of samples and limited read/write speed. Most research on compression algorithms rarely provides flexible random file reading scheme. Database-developed solution guarantees the efficiency of random file reading, but nevertheless the efforts in compression and third-party software support are insufficient. Under the premise of ensuring the efficiency of decompression, we propose an encoding scheme “Stack-ZDPD” that is optimized for storage of raw MS data, designed for the format “Aird”, a computation-oriented format with fast accessing and decoding time, where the core compression algorithm is “ZDPD”. Stack-ZDPD reduces the volume of data stored in mzML format by around 80% or more, depending on the data acquisition pattern, and the compression ratio is approximately 30% compared to ZDPD for data generated using Time of Flight technology. Our approach is available on AirdPro, for file conversion and the Java-API Aird-SDK, for data parsing.

Cite

CITATION STYLE

APA

Wang, J., Lu, M., Wang, R., An, S., Xie, C., & Yu, C. (2022). StackZDPD: a novel encoding scheme for mass spectrometry data optimized for speed and compression ratio. Scientific Reports, 12(1). https://doi.org/10.1038/s41598-022-09432-1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free