The research of web parallel information extraction based on Hadoop

Songyu Ma; Quan Shi; Lu Xu

Journal Article

The research of web parallel information extraction based on Hadoop

Advances in Intelligent Systems and Computing (2014) 255 341-348

DOI: 10.1007/978-81-322-1759-6_41

1Citations

5Readers

Get full text

Abstract

Big data that are driven by three major trends such as cloud computing, social computing, and mobile computing are reshaping the business process, IT infrastructure and our capture of the enterprise, customer and Internet information and use. To extract the big data in the Internet, the enterprise needs a scalable, flexible, and manageable data infrastructure. Therefore, this paper is based on the Hadoop framework, to analyze and design the large data information extraction system. Measurement shows that the huge amounts of data extraction on the basis of cluster have great improvement in performance compared with single extraction, with high reliability and scalability. What is more? The research of this paper will provide better technical solutions to Web information extraction and sensitive information.

Author supplied keywords

Cite

CITATION STYLE

APA

Ma, S., Shi, Q., & Xu, L. (2014). The research of web parallel information extraction based on Hadoop. Advances in Intelligent Systems and Computing, 255, 341–348. https://doi.org/10.1007/978-81-322-1759-6_41

The research of web parallel information extraction based on Hadoop

Abstract

Author supplied keywords

Cite

Register to see more suggestions