Abstract
Data analysis has become a challenge in recent years as the volume of data generated has become difficult to manage, therefore more hardware and software resources are needed to store and process this huge amount of data. Apache Hadoop is a free framework, widely used thanks to the Hadoop Distributed Files System (HDFS) and its ability to relate to other data processing and analysis components such as MapReduce for processing data, Spark - in-memory Data Processing, Apache Drill - SQL on Hadoop, and many other. In this paper, we analyze the Hadoop framework implementation making a comparative study between Single-node and Multi-node cluster on Hadoop. We will explain in detail the two layers at the base of the Hadoop architecture: HDFS Layer with its deamons NameNode, Secondary NameNode, DataNodes and MapReuce Layer with JobTrackers, TaskTrackers daemons. This work is part of a complex one aiming to perform data processing in Data Lake structures.
Author supplied keywords
Cite
CITATION STYLE
Zagan, E., & Danubianu, M. (2021). HADOOP: A Comparative Study between Single-Node and Multi-Node Cluster. International Journal of Advanced Computer Science and Applications, 12(2), 53–58. https://doi.org/10.14569/IJACSA.2021.0120207
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.