In chapters so far, you have relied on HDFS as your storage medium. It has two major advantages for the type of processing we desired to do. It excels at storing large files and enabling distributed processing of these files with help of MapReduce. HDFS is most efficient for tasks that require a pass through all data in a file (or a set of files). In case you only need to access a certain element in a dataset (operation sometimes called point query) or a continuous range of elements (sometimes called range query), HDFS does not provide you an efficient toolkit for the task. You are forced to simply scan over all elements to pick out the ones you are interested in.
CITATION STYLE
Wiktorski, T. (2019). NOSQL databases. In Advanced Information and Knowledge Processing (pp. 77–84). Springer London. https://doi.org/10.1007/978-3-030-04603-3_8
Mendeley helps you to discover research relevant for your work.