Recognition of data records in semi-structured web-pages using ontology and χ 2 statistical distribution

Amin Keshavarzi; Amir Masoud Rahmani; Mehran Mohsenzadeh; Reza Keshavarzi

Conference Proceedings

Recognition of data records in semi-structured web-pages using ontology and χ 2 statistical distribution

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 5139 LNAI 675-682

DOI: 10.1007/978-3-540-88192-6_71

1Citations

9Readers

Get full text

Abstract

Information extraction (IE) has been emerged as a novel discipline in computer science. In IE, intelligent algorithms are employed to extract the required data, and structure them so that they are appropriate for query. In most IE systems, a web-page structure, e.g. HTML tags are used to recognize the looked-for information. In this article, an algorithm is developed to recognize the main region of web-pages containing the looked-for information, by means of an ontology, a web-page structure and goodness-of-fit χ 2 test. After recognizing the main region, the existing records of the region are recognized, and then each record is put in a text file. © 2008 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Keshavarzi, A., Rahmani, A. M., Mohsenzadeh, M., & Keshavarzi, R. (2008). Recognition of data records in semi-structured web-pages using ontology and χ 2 statistical distribution. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5139 LNAI, pp. 675–682). Springer Verlag. https://doi.org/10.1007/978-3-540-88192-6_71

Recognition of data records in semi-structured web-pages using ontology and χ 2 statistical distribution

Abstract

Author supplied keywords

Cite

Register to see more suggestions