Information extraction (IE) has been emerged as a novel discipline in computer science. In IE, intelligent algorithms are employed to extract the required data, and structure them so that they are appropriate for query. In most IE systems, a web-page structure, e.g. HTML tags are used to recognize the looked-for information. In this article, an algorithm is developed to recognize the main region of web-pages containing the looked-for information, by means of an ontology, a web-page structure and goodness-of-fit χ 2 test. After recognizing the main region, the existing records of the region are recognized, and then each record is put in a text file. © 2008 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Keshavarzi, A., Rahmani, A. M., Mohsenzadeh, M., & Keshavarzi, R. (2008). Recognition of data records in semi-structured web-pages using ontology and χ 2 statistical distribution. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5139 LNAI, pp. 675–682). Springer Verlag. https://doi.org/10.1007/978-3-540-88192-6_71
Mendeley helps you to discover research relevant for your work.