Reformatting web documents via header trees

1Citations
Citations of this article
74Readers
Mendeley users who have this article in their library.

Abstract

We propose a new method for reformatting web documents by extracting semantic structures from web pages. Our approach is to extract trees that describe hierarchical relations in documents. We developed an algorithm for this task by employing the EM algorithm and clustering techniques. Preliminary experiments showed that our approach was more effective than baseline methods. © 2005 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Yoshida, M., & Nakagawa, H. (2005). Reformatting web documents via header trees. In ACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (pp. 121–124). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1225753.1225784

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free