Randomness of data quality artifacts

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Quality of data is often measured by counting artifacts. While this procedure is very simple and applicable to many different types of artifacts like errors, inconsistencies and missing values, counts do not differentiate between different distributions of data artifacts. A possible solution is to add a randomness measure to indicate how randomly data artifacts are distributed. It has been proposed to calculate randomness by means of the Lempel-Ziv complexity algorithm, this approach comes with some demerits. Most importantly, the Lempel-Ziv approach assumes that there is some implicit order among data objects and the measured randomness depends on this order. To overcome this problem, a new method is proposed which measures randomness proportionate to the average amount of bits needed to compress the bit matrix matching the artifacts in a database relation by using unary coding. It is shown that this method has several interesting properties that align the proposed measurement procedure with the intuitive perception of randomness.

Author supplied keywords

Cite

CITATION STYLE

APA

Boeckling, T., Bronselaer, A., & De Tré, G. (2018). Randomness of data quality artifacts. In Communications in Computer and Information Science (Vol. 855, pp. 529–540). Springer Verlag. https://doi.org/10.1007/978-3-319-91479-4_44

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free