In the big data environment, MapReduce could be adopted to improve the efficiency of iterative algorithm on massive data through running the iterative algorithm on larger PC-cluster. However, it is inefficient if the entire data has to be re-iterated when new data is introduced. In this paper, the incremental iterative computing model (I2M) based on the incremental data and original iterative results is proposed. Then, the MapReduce and I2M based descendant query, PageRank, and K-means, are enumerated. Finally, incremental iterative computing framework (I2F) is implemented by extending HaLoop to support incremental iterative computing. A series of test cases are designed to evaluate I2F on functionality, performance, and cost of incremental iteration. The incremental iterative model proposed in this paper can adapt many iterative algorithms, and promotes the application and optimization of iterative algorithm in the big data environment.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below