Capturing semantics in html documents

Mengchi Liu

Conference Proceedings

Capturing semantics in html documents

Liu M

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2002) 2453 103-112

DOI: 10.1007/3-540-46146-9_11

0Citations

4Readers

Get full text

Abstract

Most documents available over the web confirm to the HTML specification. They are intended to be human readable through a web browser and thus are constructed following some common conventions. Based on such common conventions, the Conceptual Model for HTML was proposed recently to automatically capture the hierarchical structure within web documents. However, certain key semantic information about the contents in the documents, which are obvious to human, are often omitted. As a result, web data processing, manipulation and integration are still quite difficult. In this paper, we discuss how to extend the Conceptual Model for HTML to capture the intended semantics of the HTML documents. We show that with the new constructs introduced, using an Intelligent Wrapper, and limited human interaction, semantics can be transferred from human into the Extended Conceptual Model so that further meaningful processing, manipulation and integration of web documents become possible.

Cite

CITATION STYLE

APA

Liu, M. (2002). Capturing semantics in html documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2453, pp. 103–112). Springer Verlag. https://doi.org/10.1007/3-540-46146-9_11

Capturing semantics in html documents

Abstract

Cite

Register to see more suggestions