The Automatic Extraction of Web Information Based on Regular Expression

undefined; Li Ji; Jiang Guangyu; Xu Aijun; Wang Yunzhen

Journal ArticleOPEN ACCESS

The Automatic Extraction of Web Information Based on Regular Expression

Ji L
Guangyu J
et al.

Journal of Software (2017) 12(4) 180-188

DOI: 10.17706/jsw.12.3.180-188

N/ACitations

6Readers

Abstract

Based on search engine , this paper built a Web information retrieval matching and structure extraction model. And realized the algorithm of locating and automatically extracting multi-web Baidu news information. Getting the standard mathematical expression of URLs by analyzing the search results URLs and analyzing the DOM tree structure of web pages, this article designed the key tags regular expression. Finally, the method of multi-page location retrieval and structured extraction based on search engine is realized. The experimental results showed that the average extraction result is 99.60%, and the matching ratio is 99.56%. It can be used for Web information structure and automatic extraction and local preservation.

Cite

CITATION STYLE

APA

Ji, L., Guangyu, J., Aijun, X., & Yunzhen, W. (2017). The Automatic Extraction of Web Information Based on Regular Expression. Journal of Software, 12(4), 180–188. https://doi.org/10.17706/jsw.12.3.180-188

The Automatic Extraction of Web Information Based on Regular Expression

Abstract

Cite

Register to see more suggestions