Corroborating quality of data through density information

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Unclean data is a common issue in data quality. Multiple quality issues may exist in data. Over time, data may become obsolete, transcription errors may make data inaccurate, and integration of multiple data sources may cause multiple tuples referring to the same entity to exist in data. Hence, a cleaning process is necessary. Fixing one issue may not clean the data, relying on human effort to correct the data is expensive, and master data or training data may not be available. This paper studies a data cleaning problem by introducing techniques based on corroboration, i.e. taking into consideration the trustworthiness of the attribute values. It presents a data deduplication approach for data that have outdated and inaccurate values in the duplicated tuples. This approach utilizes the density information embedded inside the tuples to guide the cleaning process. This paper introduces a framework and algorithms to integrate data deduplication with data currency and accuracy that can fix multiple data quality issues without the reliance on manual user interaction, master data, or training data.

Cite

CITATION STYLE

APA

Al-janabi, S., & Janicki, R. (2018). Corroborating quality of data through density information. In Lecture Notes in Networks and Systems (Vol. 15, pp. 1128–1146). Springer. https://doi.org/10.1007/978-3-319-56994-9_78

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free