Extracting various types of informative web content via fuzzy sequential pattern mining

4Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we present a web content extraction method to extract different types of informative web content for news web pages. A fuzzy sequential pattern mining method, namely FSP, is developed to gradually discover fuzzy sequential patterns for various types of informative web content. To avoid the situation that the usage of HTML tags may be changed with the development of web technology, fuzzy sequential patterns are mined using a stable feature, in particular, the number of tokens in each line of source code. We have conducted extensive experiments and good clustering properties for the discovered sequential patterns are observed. Experimental results demonstrate that the FSP method is effective compared with state-of-the-art content extraction methods. Besides main articles of web pages, it can also find other types interesting web content such as article recommendations and article titles effectively.

Cite

CITATION STYLE

APA

Huang, T., Huang, R., Liu, B., & Yan, Y. (2017). Extracting various types of informative web content via fuzzy sequential pattern mining. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10366 LNCS, pp. 230–238). Springer Verlag. https://doi.org/10.1007/978-3-319-63579-8_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free