Recognition of data records in semi-structured web-pages using ontology and χ 2 statistical distribution

1Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Information extraction (IE) has been emerged as a novel discipline in computer science. In IE, intelligent algorithms are employed to extract the required data, and structure them so that they are appropriate for query. In most IE systems, a web-page structure, e.g. HTML tags are used to recognize the looked-for information. In this article, an algorithm is developed to recognize the main region of web-pages containing the looked-for information, by means of an ontology, a web-page structure and goodness-of-fit χ 2 test. After recognizing the main region, the existing records of the region are recognized, and then each record is put in a text file. © 2008 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Keshavarzi, A., Rahmani, A. M., Mohsenzadeh, M., & Keshavarzi, R. (2008). Recognition of data records in semi-structured web-pages using ontology and χ 2 statistical distribution. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5139 LNAI, pp. 675–682). Springer Verlag. https://doi.org/10.1007/978-3-540-88192-6_71

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free