Partition Aware Duplicate Records Detection (PADRD) Methodology in Big Data - Decision Support Systems

0Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

As on today, the big data analytics and business intelligence (BI) decision support system (DSS) are the vital pillar of the leadership ability by translating raw data toward intelligence to make ‘right decision on right time’ and to share ‘right decision to right people’. Often DSS challenged to process the massive volume of data (Terabyte, petabyte, Exabyte, Zettabyte etc.) and to overcome the issues like data quality, scalability, storage and query performance. The failure in DSS was one of the reasons highlighted clearly by United State Senate report regarding the 2008 American economy collapse. To keep these issues in mind, this work explores a preventive methodology for “Data Quality - Duplicates” dimension with optimized query performance in big data era. In detail, BI team extracts and loads the historical operational structured data (Data Feed) to its repository from multiple sources periodically such as daily, weekly, monthly, quarterly, half yearly for analytics and reporting. During this load unpremeditated duplicate data feed insertion occurs due to lack of expertise, lack of history, missing integrity constraints which impact the intelligence reporting error ratio & the leader ship ability. So the necessity of unintentional data quality issue injection prevention arises. Over all, this paper proposes a methodology to “Improve the Data Accuracy” through detection of duplicate records between big data repository vs data feed before the data load with “Optimized Query Performance” through partition aware search query generation and “Faster Data Block Address Search” through braided b+ tree indexing.

Cite

CITATION STYLE

APA

Kirubakaran, A., & Murugaiyan, A. (2018). Partition Aware Duplicate Records Detection (PADRD) Methodology in Big Data - Decision Support Systems. In Communications in Computer and Information Science (Vol. 804, pp. 86–98). Springer Verlag. https://doi.org/10.1007/978-981-10-8603-8_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free