Abstract
The conventional extracting-transforming-loading (ETL) system is typically operated on a single machine not capable of handling huge volumes of geospatial big data. To deal with the considerable amount of big data in the ETL process, we propose D-ELT (delayed extracting-loading-transforming) by utilizing MapReduce-based parallelization. Among various kinds of big data, we concentrate on geospatial big data generated via sensors using Internet of Things (IoT) technology. In the IoT environment, update latency for sensor big data is typically short and old data are not worth further analysis, so the speed of data preparation is even more significant. We conducted several experiments measuring the overall performance of D-ELT and compared it with both traditional ETL and extracting-loading- transforming (ELT) systems, using different sizes of data and complexity levels for analysis. The experimental results show that D-ELT outperforms the other two approaches, ETL and ELT. In addition, the larger the amount of data or the higher the complexity of the analysis, the greater the parallelization effect of transform in D-ELT, leading to better performance over the traditional ETL and ELT approaches.
Author supplied keywords
Cite
CITATION STYLE
Jo, J., & Lee, K. W. (2019). MapReduce-Based D-ELT framework to address the challenges of geospatial big data. ISPRS International Journal of Geo-Information, 8(11). https://doi.org/10.3390/ijgi8110475
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.