Big data is being generating in a wide variety of formats at an exponential rate. Big data analytics deals with processing and analyzing voluminous data to provide useful insight for guided decision making. The traditional data storage and management tools are not well-equipped to handle big data and its application. Apache Hadoop is a popular open-source platform that supports storage and processing of extremely large datasets. For the purposes of big data analytics, Hadoop ecosystem provides a variety of tools. However, there is a need to select a tool that is best suited for a specific requirement of big data analytics. The tools have their own advantages and drawbacks over each other. Some of them have overlapping business use cases however they differ in critical functional areas. So, there is a need to consider the trade-offs between usability and suitability while selecting a tool from Hadoop ecosystem. This paper identifies the requirements for Big Data Analytics (BDA) and maps tools of the Hadoop framework that are best suited for them. For this, we have categorized Hadoop tools according to their functionality and usage. Different Hadoop tools are discussed from the users’ perspective along with their pros and cons, if any. Also, for each identified category, comparison of Hadoop tools based on important parameters is presented. The tools have been thoroughly studied and analyzed based on their suitability for the different requirements of big data analytics. A mapping of big data analytics requirements to the Hadoop tools has been established for use by the data analysts and predictive modelers.
CITATION STYLE
Bharti, U., Bajaj, D., Goel, A., & Gupta, S. C. (2019). Identifying requirements for big data analytics and mapping to hadoop tools. International Journal of Recent Technology and Engineering, 8(3), 4384–4392. https://doi.org/10.35940/ijrte.C5524.098319
Mendeley helps you to discover research relevant for your work.