DWCLEANSER: A framework for approximate duplicate detection

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data quality has become a major area of concern in data warehouse. The prime aim of a data warehouse is to store quality data so that it can enhance the decision support systems effectively. Quality of data is improved by employing data cleaning techniques. Data cleaning deals with detecting and removing errors and discrepancies from data. This paper presents a novel framework for detection of exact as well as approximate duplicates in a data warehouse. The proposed approach decreases the complexity involved in the previously designed frameworks by providing efficient data cleaning techniques. In addition, appropriate methods have been framed to manage the outliers and missing values in the datasets. Moreover, comprehensive repositories have been provided that will be useful in incremental data cleaning. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Thakur, G., Singh, M., Pahwa, P., & Tyagi, N. (2011). DWCLEANSER: A framework for approximate duplicate detection. In Communications in Computer and Information Science (Vol. 198 CCIS, pp. 355–364). https://doi.org/10.1007/978-3-642-22555-0_37

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free