Empowering domain experts to preprocess massive distributed datasets

Michael Behringer; Pascal Hirmer; Manuel Fritz; Bernhard Mitschang

Conference Proceedings

Empowering domain experts to preprocess massive distributed datasets

Lecture Notes in Business Information Processing (2020) 389 LNBIP 61-75

DOI: 10.1007/978-3-030-53337-3_5

5Citations

3Readers

Get full text

Abstract

In recent years, the amount of data is growing extensively. In companies, spreadsheets are one common approach to conduct data processing and statistical analysis. However, especially when working with massive amounts of data, spreadsheet applications have their limitations. To cope with this issue, we introduce a human-in-the-loop approach for scalable data preprocessing using sampling. In contrast to state-of-the-art approaches, we also consider conflict resolution and recommendations based on data not contained in the sample itself. We implemented a fully functional prototype and conducted a user study with 12 participants. We show that our approach delivers a significantly higher error correction than comparable approaches which only consider the sample dataset.

Author supplied keywords

Cite

CITATION STYLE

APA

Behringer, M., Hirmer, P., Fritz, M., & Mitschang, B. (2020). Empowering domain experts to preprocess massive distributed datasets. In Lecture Notes in Business Information Processing (Vol. 389 LNBIP, pp. 61–75). Springer. https://doi.org/10.1007/978-3-030-53337-3_5

Empowering domain experts to preprocess massive distributed datasets

Abstract

Author supplied keywords

Cite

Register to see more suggestions