DWCLEANSER: A framework for approximate duplicate detection

Garima Thakur; Manu Singh; Payal Pahwa; Nidhi Tyagi

Conference Proceedings

DWCLEANSER: A framework for approximate duplicate detection

Communications in Computer and Information Science (2011) 198 CCIS 355-364

DOI: 10.1007/978-3-642-22555-0_37

1Citations

4Readers

Get full text

Abstract

Data quality has become a major area of concern in data warehouse. The prime aim of a data warehouse is to store quality data so that it can enhance the decision support systems effectively. Quality of data is improved by employing data cleaning techniques. Data cleaning deals with detecting and removing errors and discrepancies from data. This paper presents a novel framework for detection of exact as well as approximate duplicates in a data warehouse. The proposed approach decreases the complexity involved in the previously designed frameworks by providing efficient data cleaning techniques. In addition, appropriate methods have been framed to manage the outliers and missing values in the datasets. Moreover, comprehensive repositories have been provided that will be useful in incremental data cleaning. © 2011 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Thakur, G., Singh, M., Pahwa, P., & Tyagi, N. (2011). DWCLEANSER: A framework for approximate duplicate detection. In Communications in Computer and Information Science (Vol. 198 CCIS, pp. 355–364). https://doi.org/10.1007/978-3-642-22555-0_37

DWCLEANSER: A framework for approximate duplicate detection

Abstract

Author supplied keywords

Cite

Register to see more suggestions