Enabling near-data processing in distributed object storage systems

4Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Most general-purpose distributed storage systems are not designed with near data processing (NDP) in mind. They do not respect semantic data boundaries when writing data, for example splitting a record across servers. This reduces NDP effectiveness by requiring data collation before computation. While semantic data awareness and NDP functions can be retroactively added to existing distributed storage, it is often complex and difficult to accomplish in practice. We propose sharing storage system layout information with data writers so they can adjust data layouts to prevent data alignment issues regardless of the underlying architectures. By doing so, we can simplify NDP implementation by reducing the need for data reassembly, and reduce the need for complex storage system or application extensions. We demonstrate a hinting mechanism on both HDFS with computational block storage and an erasure coded MinIO deployment, reducing data movement by up to 99% when querying CSV data with NDP co-located with the stored data. This was accomplished purely with client side data alignment, no modifications to the server side write paths, and no inter-node collation of data.

Cite

CITATION STYLE

APA

Adams, I. F., Agrawal, N., & Mesnier, M. P. (2021). Enabling near-data processing in distributed object storage systems. In HotStorage 2021 - Proceedings of the 13th ACM Workshop on Hot Topics in Storage and File Systems (pp. 28–34). Association for Computing Machinery, Inc. https://doi.org/10.1145/3465332.3470881

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free