Automatic Error Correction Using the Wikipedia Page Revision History

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Error correction is one of the most crucial and time-consuming steps of data preprocessing. State-of-the-art error correction systems leverage various signals, such as predefined data constraints or user-provided correction examples, to fix erroneous values in a semi-supervised manner. While these approaches reduce human involvement to a few labeled tuples, they still need supervision to fix data errors. In this paper, we propose a novel error correction approach to automatically fix data errors of dirty datasets. Our approach pretrains a set of error corrector models on correction examples extracted from the Wikipedia page revision history. It then fine-tunes these models on the dirty dataset at hand without any required user labels. Finally, our approach aggregates the fine-tuned error corrector models to find the actual correction of each data error. As our experiments show, our approach automatically fixes a large portion of data errors of various dirty datasets with high precision.

Cite

CITATION STYLE

APA

Hasan, M. K., & Mahdavi, M. (2021). Automatic Error Correction Using the Wikipedia Page Revision History. In International Conference on Information and Knowledge Management, Proceedings (pp. 3073–3077). Association for Computing Machinery. https://doi.org/10.1145/3459637.3482062

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free