The traditional Web news article contents extraction methods are time-costly and need much maintenance because they analyze the layout of news pages to generate the wrappers manually or automatically. In this paper, we propose a relevance-based analysis method to extract the news article contents from the news pages without the analysis of news page layouts before extraction. This method is applicable to the general news pages and we give the implementations of news extraction from different kinds of news sources. © 2009 Springer Berlin Heidelberg.
CITATION STYLE
Han, H., & Tokuda, T. (2009). A layout-independent web news article contents extraction method based on relevance analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5648 LNCS, pp. 453–460). https://doi.org/10.1007/978-3-642-02818-2_37
Mendeley helps you to discover research relevant for your work.