Cleenex: Support for User Involvement during an Iterative Data Cleaning Process

1Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

The existence of large amounts of data increases the probability of occurring data quality problems. A data cleaning process that corrects these problems is usually an iterative process, because it may need to be re-executed and refined to produce high-quality data. Moreover, due to the specificity of some data quality problems and the limitation of data cleaning programs to cover all problems, often a user has to be involved during the program executions by manually repairing data. However, there is no data cleaning framework that appropriately supports this involvement in such an iterative process, a form of human-in-The-loop, to clean structured data. Moreover, data preparation tools that somehow involve the user in data cleaning processes have not been evaluated with real users to assess their effort.Therefore, we propose Cleenex, a data cleaning framework with support for user involvement during an iterative data cleaning process, and conduct two data cleaning experimental evaluations: An assessment of the Cleenex components that support the user when manually repairing data with a simulated user; and a comparison, in terms of user involvement, of data preparation tools with real users.Results show that Cleenex components reduce the user effort when manually cleaning data during a data cleaning process, for example, the number of tuples visualized is reduced in 99%. Moreover, when performing data cleaning tasks with Cleenex, real users need less time/effort (e.g., half the clicks) and, based on questionnaires, prefer it to the other tools used for comparison, OpenRefine and Pentaho Data Integration.

Cite

CITATION STYLE

APA

Pereira, J. L. M., Fonseca, M. J., Lopes, A., & Galhardas, H. (2024). Cleenex: Support for User Involvement during an Iterative Data Cleaning Process. Journal of Data and Information Quality, 16(1). https://doi.org/10.1145/3648476

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free