Machine learning in data lake for combining data silos

Merlinda Wibowo; Sarina Sulaiman; Siti Mariyam Shamsuddin

Conference Proceedings

Machine learning in data lake for combining data silos

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10387 LNCS 294-306

DOI: 10.1007/978-3-319-61845-6_30

15Citations

80Readers

Get full text

Abstract

Data silo can grow to be a large-scale data for years, overlapping and has an indefinite quality. It allows an organization to develop their own analytical capabilities. Data lake has the ability to solve this problem efficiently with the data analysis by using statistical and predictive modeling techniques which can be applied to enhance and support an organization’s business strategy. This study provides an overview of the process of decision-making, operational efficiency, and creating the solution for an organization. Machine Learning can distribute the architecture of data model and integrate the data silo with other organizations data to optimize the operational business processes within an organization in order to improve data quality and efficiency. Testing is done by utilizing the data from the Malaysia’s and Singapore’s Government Open Data on the Air Pollutant Index to determine the condition of air pollution levels for the health and safety of the population.

Author supplied keywords

Cite

CITATION STYLE

APA

Wibowo, M., Sulaiman, S., & Shamsuddin, S. M. (2017). Machine learning in data lake for combining data silos. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10387 LNCS, pp. 294–306). Springer Verlag. https://doi.org/10.1007/978-3-319-61845-6_30

Machine learning in data lake for combining data silos

Abstract

Author supplied keywords

Cite

Register to see more suggestions