Data centric text processing using MapReduce

N. Sandhya; Philip Samuel

Conference Proceedings

Data centric text processing using MapReduce

Advances in Intelligent Systems and Computing (2016) 424 129-137

DOI: 10.1007/978-3-319-28031-8_11

0Citations

2Readers

Get full text

Abstract

Processing huge volume of data opened new opportunities in ecommerce, engineering, business and large computing applications. MapReduce programming model is a parallel data processing approach for execution on computer clusters. This model provides an abstraction to design scalable computing algorithm for big data processing. For batch processing types of data processing, MapReduce model provides faster computation. The key/value pair generation of MapReduce program creates memory overhead and deserialization overhead due to data redundancy. Redundancy of data is one of the most important factors that consumes space and affect system performance while using large set of data. This overhead can be avoided considerably by using a novel approach that we developed named Data Triggered Multithreaded Programming (DTMP) model. In this paper, we demonstrate the use of DTMP model using a large dataset with author details and his publications. The Data Triggered Multithreaded Programming can dynamically allocate the resources and can identify the data repetition occurring during computation. DTMP model when applied to the MapReduce programming model brings performance improvement to the system. The major contributions of this work are a simple, scalable and powerful processing of text data that enables automatic parallelization and distribution of large-scale computations.

Author supplied keywords

Cite

CITATION STYLE

APA

Sandhya, N., & Samuel, P. (2016). Data centric text processing using MapReduce. In Advances in Intelligent Systems and Computing (Vol. 424, pp. 129–137). Springer Verlag. https://doi.org/10.1007/978-3-319-28031-8_11

Data centric text processing using MapReduce

Abstract

Author supplied keywords

Cite

Register to see more suggestions