Benchmarking fast-data platforms for the aadhaar biometric database

Yogesh Simmhan; Anshu Shukla; Arun Verma

Book Chapter

Benchmarking fast-data platforms for the aadhaar biometric database

Springer Verlag, (2016), 21-39

DOI: 10.1007/978-3-319-49748-8_2

1Citations

16Readers

Get full text

Abstract

Aadhaar is the world’s largest biometric database with a billion records, being compiled as an identity platform to deliver social services to residents of India. Aadhaar processes streams of biometric data as residents are enrolled and updated. Besides ∼1 million enrollments and updates per day, up to 100 million daily biometric authentications are expected during delivery of various public services. These form critical Big Data applications, with large volumes and high velocity of data. Here, we propose a stream processing workload, based on the Aadhaar enrollment and Authentication applications, as a Big Data benchmark for distributed stream processing systems. We describe the application composition, and characterize their task latencies and selectivity, and data rate and size distributions, based on real observations. We also validate this benchmark on Apache Storm using synthetic streams and simulated application logic. This paper offers a unique glimpse into an operational national identity infrastructure, and proposes a benchmark for “fast data” platforms to support such eGovernance applications.

Cite

CITATION STYLE

APA

Simmhan, Y., Shukla, A., & Verma, A. (2016). Benchmarking fast-data platforms for the aadhaar biometric database. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10044, pp. 21–39). Springer Verlag. https://doi.org/10.1007/978-3-319-49748-8_2

Benchmarking fast-data platforms for the aadhaar biometric database

Abstract

Cite

Register to see more suggestions