Recent years, the amount of semi-structured documents available electrically has increased dramatically. Semi-structured documents usually are difficult to reuse due to the lack of explicit metadata. To enable integration and retrieval over semi-structured documents, the essential aspects in the documents should be described by metadata explicitly. The metadata could be assigned to documents and present part of their information content using various IE techniques. This paper also provides flexible user interaction mechanism to achieve better performance over less training sample documents. In semantic view extraction, by using similarity based rule induction, we have been able to improve the rule learning procedure. Experimental results show that our approach can significantly outperform most of the existing wrapper methods. We make use of the semantics that resides in document logical structure to help find relations between semantic entities. After semantic annotations of the documents, TIPSI allows those to be indexed with respect to the extracted text entities. To answer the query, TIPSI applies semantic restrictions over the entities in the KB. © Springer-Verlag Berlin Heidelberg 2014.
CITATION STYLE
Zhang, K., Li, J. Z., Hong, M. C., Yan, X. D., & Song, Q. (2014). A Semantics Enabled Intelligent Semi-structured Document Processor. In Communications in Computer and Information Science (Vol. 426 CCIS, pp. 328–344). Springer Verlag. https://doi.org/10.1007/978-3-662-43908-1_41
Mendeley helps you to discover research relevant for your work.