Abstract
Extreme-scale simulations and high-resolution instruments have been generating an increasing amount of data, which poses significant challenges to not only data storage during the run, but also post-processing where data will be repeatedly retrieved and analyzed for a long period of time the challenges in satisfying a wide range of post-hoc analysis needs while minimizing the I/O overhead caused by inappropriate and/or excessive data retrieval should never be left unmanaged. In this paper, we propose a data refactoring, compressing, and retrieval framework capable of 1) fine-grained data refactoring with regard to precision; 2) incrementally retrieving and recomposing the data in terms of various error bounds; and 3) adaptively retrieving data in multi-precision and multi-resolution with respect to different analysis. With the progressive data re-composition and the adaptable retrieval algorithms, our framework significantly reduces the amount of data retrieved when multiple incremental precision are requested and/or the downstream analysis time when coarse resolution is used. Experiments show that the amount of data retrieved under the same progressively requested error bound using our framework is 64% less than that using state-of-The-Art single-error-bounded approaches. Parallel experiments with up to 1, 024 cores and 600 GB data in total show that our approach yields 1.36× and 2.52× performance over existing approaches in writing to and reading from persistent storage systems, respectively.
Author supplied keywords
Cite
CITATION STYLE
Liang, X., Gong, Q., Chen, J., Whitney, B., Wan, L., Liu, Q., … Klasky, S. (2021). Error-controlled, progressive, and adaptable retrieval of scientific data with multilevel decomposition. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. IEEE Computer Society. https://doi.org/10.1145/3458817.3476179
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.