A method for automatic discovery of reference data

Lukasz Ciszak

Conference Proceedings

A method for automatic discovery of reference data

Ciszak L

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5579 LNAI 797-805

DOI: 10.1007/978-3-642-02568-6_81

1Citations

6Readers

Get full text

Abstract

The data quality assessment process consists of several phases; the first phase is the data profiling step. The result of this step is the set of the most current metadata describing the examined data set. We present here a method for automatic discovery of reference data for textual attributes. Our method combines the textual similarity approach with the characteristics of attribute value distribution. The method can discover the correct reference data values also in situations where there is a large number of data impurities. The results of the experiments performed on real address data prove that the method can effectively discover the current reference data. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Ciszak, L. (2009). A method for automatic discovery of reference data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5579 LNAI, pp. 797–805). https://doi.org/10.1007/978-3-642-02568-6_81

A method for automatic discovery of reference data

Abstract

Cite

Register to see more suggestions