Data Pattern Single Column Analysis for Data Profiling using an Open Source Platform

6Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The importance of data quality might have a major impact on the company's existing business processes. But there are still many companies that yet to understand the importance of data quality. Many cases that often occurs to the quality of data in many companies in Indonesia is that the inputted data are not filtered, so there are issues about not standardized data pattern. This case can be handled with data preprocess in which one of the methods are data profiling. Data profiling is a proses of collecting an information of a data. In this research the main focus of the analysis by conductin data profiling using data pattern method and algorithm that adopting from OpenRefine and then modified. The results of the profiling using open source tools Pentaho Data Integration, Google OpenRefine and Data Cleaner are really difference, while Pentaho Data Integration and Google OpenRefine found exactly 70 data patterns, Data Cleaner only find 31 data patterns.

Cite

CITATION STYLE

APA

Amethyst, S. R., Kusumasari, T. F., & Hasibuan, M. A. (2018). Data Pattern Single Column Analysis for Data Profiling using an Open Source Platform. In IOP Conference Series: Materials Science and Engineering (Vol. 453). Institute of Physics Publishing. https://doi.org/10.1088/1757-899X/453/1/012024

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free