Effective and efficient web reviews extraction based on hadoop

Jian Wan; Jiawei Yan; Congfeng Jiang; Li Zhou; Zujie Ren; Yongjian Ren

Conference ProceedingsOPEN ACCESS

Effective and efficient web reviews extraction based on hadoop

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7759 LNCS 107-118

DOI: 10.1007/978-3-642-37804-1_12

0Citations

1Readers

Abstract

The rapid development of Web 2.0 brings the flourish of web reviews. Traditional web review data extraction methods suffer from poor performance in dealing with massive data. To solve this problem, we propose an effective and efficient approach to extract web reviews based on Hadoop. It overcomes inefficiency when dealing with large-scale data, and enables the accuracy and efficiency in extracting the massive data sets. Our proposed approach consists of two components: a review record extraction algorithm based on node similarity, and a review content extraction algorithm based on the text depth. We design a Hadoop-based web reviews automatic extraction system. At last, we test the extraction system using the massive web reviews page sets. The experimental results show that this extraction system can achieve accuracy of more than 96%, and also can obtain a higher speedup, compared with the traditional web extraction. © Springer-Verlag 2013.

Author supplied keywords

Cite

CITATION STYLE

APA

Wan, J., Yan, J., Jiang, C., Zhou, L., Ren, Z., & Ren, Y. (2013). Effective and efficient web reviews extraction based on hadoop. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7759 LNCS, pp. 107–118). https://doi.org/10.1007/978-3-642-37804-1_12

Effective and efficient web reviews extraction based on hadoop

Abstract

Author supplied keywords

Cite

Register to see more suggestions