A layout-independent web news article contents extraction method based on relevance analysis

Hao Han; Takehiro Tokuda

Conference ProceedingsOPEN ACCESS

A layout-independent web news article contents extraction method based on relevance analysis

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5648 LNCS 453-460

DOI: 10.1007/978-3-642-02818-2_37

N/ACitations

7Readers

Abstract

The traditional Web news article contents extraction methods are time-costly and need much maintenance because they analyze the layout of news pages to generate the wrappers manually or automatically. In this paper, we propose a relevance-based analysis method to extract the news article contents from the news pages without the analysis of news page layouts before extraction. This method is applicable to the general news pages and we give the implementations of news extraction from different kinds of news sources. © 2009 Springer Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Han, H., & Tokuda, T. (2009). A layout-independent web news article contents extraction method based on relevance analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5648 LNCS, pp. 453–460). https://doi.org/10.1007/978-3-642-02818-2_37

A layout-independent web news article contents extraction method based on relevance analysis

Abstract

Author supplied keywords

Cite

Register to see more suggestions