Assessing the Dependability of Apache Spark System: Streaming Analytics on Large-Scale Ocean Data

Janak Dahal; Elias Ioup; Shaikh Arifuzzaman; Mahdi Abdelguerfi

Conference Proceedings

Assessing the Dependability of Apache Spark System: Streaming Analytics on Large-Scale Ocean Data

Communications in Computer and Information Science (2019) 1123 CCIS 131-144

DOI: 10.1007/978-981-15-1304-6_11

0Citations

4Readers

Get full text

Abstract

Real-world data from diverse domains require real-time scalable analysis. Large-scale data processing frameworks or engines such as Hadoop fall short when results are needed on-the-fly. Apache Spark’s streaming library is increasingly becoming a popular choice as it can stream and analyze a significant amount of data. In this paper, we analyze large-scale geo-temporal data collected from the USGODAE (United States Global Ocean Data Assimilation Experiment) data catalog, and showcase and assess the dependability of Spark stream processing. We measure the latency of streaming and monitor scalability by adding and removing nodes in the middle of a streaming job. We also verify the fault tolerance by stopping nodes in the middle of a job and making sure that the job is rescheduled and completed on other nodes. We design a full-stack application that automates data collection, data processing and visualizing the results. We also use Google Maps API to visualize results by color coding the world map with values from various analytics.

Author supplied keywords

Cite

CITATION STYLE

APA

Dahal, J., Ioup, E., Arifuzzaman, S., & Abdelguerfi, M. (2019). Assessing the Dependability of Apache Spark System: Streaming Analytics on Large-Scale Ocean Data. In Communications in Computer and Information Science (Vol. 1123 CCIS, pp. 131–144). Springer. https://doi.org/10.1007/978-981-15-1304-6_11

Assessing the Dependability of Apache Spark System: Streaming Analytics on Large-Scale Ocean Data

Abstract

Author supplied keywords

Cite

Register to see more suggestions