Corroborating quality of data through density information

Samir Al-janabi; Ryszard Janicki

Book Chapter

Corroborating quality of data through density information

Springer, (2018), 1128-1146

DOI: 10.1007/978-3-319-56994-9_78

0Citations

4Readers

Get full text

Abstract

Unclean data is a common issue in data quality. Multiple quality issues may exist in data. Over time, data may become obsolete, transcription errors may make data inaccurate, and integration of multiple data sources may cause multiple tuples referring to the same entity to exist in data. Hence, a cleaning process is necessary. Fixing one issue may not clean the data, relying on human effort to correct the data is expensive, and master data or training data may not be available. This paper studies a data cleaning problem by introducing techniques based on corroboration, i.e. taking into consideration the trustworthiness of the attribute values. It presents a data deduplication approach for data that have outdated and inaccurate values in the duplicated tuples. This approach utilizes the density information embedded inside the tuples to guide the cleaning process. This paper introduces a framework and algorithms to integrate data deduplication with data currency and accuracy that can fix multiple data quality issues without the reliance on manual user interaction, master data, or training data.

Author supplied keywords

Cite

CITATION STYLE

APA

Al-janabi, S., & Janicki, R. (2018). Corroborating quality of data through density information. In Lecture Notes in Networks and Systems (Vol. 15, pp. 1128–1146). Springer. https://doi.org/10.1007/978-3-319-56994-9_78

Corroborating quality of data through density information

Abstract

Author supplied keywords

Cite

Register to see more suggestions