Reformatting web documents via header trees

Minoru Yoshida; Hiroshi Nakagawa

Conference ProceedingsOPEN ACCESS

Reformatting web documents via header trees

ACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (2005) 121-124

DOI: 10.3115/1225753.1225784

1Citations

74Readers

Abstract

We propose a new method for reformatting web documents by extracting semantic structures from web pages. Our approach is to extract trees that describe hierarchical relations in documents. We developed an algorithm for this task by employing the EM algorithm and clustering techniques. Preliminary experiments showed that our approach was more effective than baseline methods. © 2005 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Yoshida, M., & Nakagawa, H. (2005). Reformatting web documents via header trees. In ACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (pp. 121–124). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1225753.1225784

Reformatting web documents via header trees

Abstract

Cite

Register to see more suggestions