Empowering domain experts to preprocess massive distributed datasets

5Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In recent years, the amount of data is growing extensively. In companies, spreadsheets are one common approach to conduct data processing and statistical analysis. However, especially when working with massive amounts of data, spreadsheet applications have their limitations. To cope with this issue, we introduce a human-in-the-loop approach for scalable data preprocessing using sampling. In contrast to state-of-the-art approaches, we also consider conflict resolution and recommendations based on data not contained in the sample itself. We implemented a fully functional prototype and conducted a user study with 12 participants. We show that our approach delivers a significantly higher error correction than comparable approaches which only consider the sample dataset.

Cite

CITATION STYLE

APA

Behringer, M., Hirmer, P., Fritz, M., & Mitschang, B. (2020). Empowering domain experts to preprocess massive distributed datasets. In Lecture Notes in Business Information Processing (Vol. 389 LNBIP, pp. 61–75). Springer. https://doi.org/10.1007/978-3-030-53337-3_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free