Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka

35Citations
Citations of this article
47Readers
Mendeley users who have this article in their library.
Get full text

Abstract

An increasingly important system requirement for distributed stream processing applications is to provide strong correctness guarantees under unexpected failures and out-of-order data so that its results can be authoritative (not needing complementary batch results). Although existing systems have put a lot of effort into addressing some specific issues, such as consistency and completeness, how to enable users to make flexible and transparent trade-off decisions among correctness, performance, and cost still remains a practical challenge. Specifically, similar mechanisms are usually applied to tackle both consistency and completeness, which can result in unnecessary performance penalties. We present Apache Kafka's core design for stream processing, which relies on its persistent log architecture as the storage and inter-processor communication layers to achieve correctness guarantees. Kafka Streams, a scalable stream processing client library in Apache Kafka, defines the processing logic as read-process-write cycles in which all processing state updates and result outputs are captured as log appends. Idempotent and transactional write protocols are utilized to guarantee exactly-once semantics. Furthermore, revision-based speculative processing is employed to emit results as soon as possible while handling out-of-order data. We also demonstrate how Kafka Streams behaves in practice with large-scale deployments and performance insights exhibiting its flexible and low-overhead trade-offs.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, G., Chen, L., Dikshit, A., Gustafson, J., Chen, B., Sax, M. J., … Rao, J. (2021). Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 2602–2613). Association for Computing Machinery. https://doi.org/10.1145/3448016.3457556

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free