Cloud computing technology has enabled storage and analysis of large volumes of data or big data. With cloud computing, a new discipline in computer science known as Data Science came into existence. Data Science is an interdisciplinary field which includes statistics, machine learning, predictive analytics and deep learning. It is meant for extracting hidden patterns from big data. Since big data consumes more storage space that cannot be accommodated with traditional storage devices, cloud computing resources of Infrastructure as a Service (IaaS) is used. Therefore, big data and big data analytics cannot exist without cloud computing. Another important fact is that big data can be subjected to analytics for obtaining Business Intelligence (BI). This process needs distributed programming frameworks like Hadoop, Apache Spark, Apache Flink, Apache Storm and Apache Samza. Without thorough understanding about these frameworks that run in cloud platforms, it is difficult to use them appropriately. Therefore, this paper throws light into a comparative study of these frameworks and evaluation of Apache Flink and Apache Spark with an empirical study. TeraSort benchmark is used for experiments.
CITATION STYLE
Patil, A. (2019). Distributed programming frameworks in cloud platforms. International Journal of Recent Technology and Engineering, 7(6), 611–619.
Mendeley helps you to discover research relevant for your work.