This chapter analyzes the problem of data cleansing and the identification of potential errors in data sets. The differing views of data cleansing are surveyed and reviewed and a brief overview of existing data cleansing tools is given. A general framework of the data cleansing process is presented as well as a set of general methods that can be used to address the problem. The applicable methods include statistical outlier detection, pattern matching, clustering, and Data Mining techniques. The experimental results of applying these methods to a real world data set are also given. Finally, research directions necessary to further address the data cleansing problem are discussed.
CITATION STYLE
Data Mining and Knowledge Discovery Handbook. (2010). Data Mining and Knowledge Discovery Handbook. Springer US. https://doi.org/10.1007/978-0-387-09823-4
Mendeley helps you to discover research relevant for your work.