FPGA - Accelerated group-by aggregation using synchronizing caches

Ildar Absalyamov; Prerna Budhkar; Skyler Windh; Robert J. Halstead; Walid A. Najjar; Vassilis J. Tsotras

Conference ProceedingsOPEN ACCESS

FPGA - Accelerated group-by aggregation using synchronizing caches

Proceedings of the ACM SIGMOD International Conference on Management of Data (2016)

DOI: 10.1145/2933349.2933360

11Citations

18Readers

Get full text

Abstract

Recent trends in hardware have dramatically dropped the price of RAM and shifted focus from systems operating on disk-resident data to in-memory solutions. In this environment high memory access latency, also known as memory wall, becomes the biggest data processing bottleneck. Traditional CPU-based architectures solved this problem by introducing large cache hierarchies. However algorithms which experience poor locality can limit the benefits of caching. In turn, hardware multithreading provides a generic solution that does not rely on algorithm-specific locality properties. In this paper we present an FPGA-accelerated implementation of in-memory group-by hash aggregation. Our design relies on hardware multithreading to efficiently mask long memory access latency by implementing a custom operation datapath on FPGA. We propose using CAMs (Content Addressable Memories) as a mechanism of synchronization and local pre-aggregation. To the best of our knowledge this is the first work, which uses CAMs as a synchronizing cache. We evaluate aggregation throughput against the state-of-the-art multithreaded software implementations and demonstrate that the FPGA-accelerated approach significantly outperforms them on large grouping key cardinalities and yields speedup up to 10x.

Author supplied keywords

Cite

CITATION STYLE

APA

Absalyamov, I., Budhkar, P., Windh, S., Halstead, R. J., Najjar, W. A., & Tsotras, V. J. (2016). FPGA - Accelerated group-by aggregation using synchronizing caches. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery. https://doi.org/10.1145/2933349.2933360

FPGA - Accelerated group-by aggregation using synchronizing caches

Abstract

Author supplied keywords

Cite

Register to see more suggestions