Pattern-based extraction of addresses from web page content

Saeid Asadi; Guowei Yang; Xiaofang Zhou; Yuan Shi; Boxuan Zhai; Wendy Wen Rong Jiang

Conference Proceedings

Pattern-based extraction of addresses from web page content

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 4976 LNCS 407-418

DOI: 10.1007/978-3-540-78849-2_41

12Citations

13Readers

Get full text

Abstract

Extraction of addresses and location names from Web pages is a challenging task for search engines. Traditional information extraction and natural processing models remain unsuccessful in the context of the Web because of the uncontrolled heterogenous nature of the Web resources as well as the effects of HTML and other markup tags. We describe a new pattern-based approach for extraction of addresses from Web pages. Both HTML and vision-based segmentations are used to increase the quality of address extraction. The proposed system uses several address patterns and a small table of geographic knowledge to hit addresses and then itemize them into smaller components. The experiments show that this model can extract and itemize different addresses effectively without large gazetteers or human supervision. © 2008 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Asadi, S., Yang, G., Zhou, X., Shi, Y., Zhai, B., & Jiang, W. W. R. (2008). Pattern-based extraction of addresses from web page content. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4976 LNCS, pp. 407–418). https://doi.org/10.1007/978-3-540-78849-2_41

Pattern-based extraction of addresses from web page content

Abstract

Author supplied keywords

Cite

Register to see more suggestions