Web contents extracting for web-based learning

Jiangtao Qiu; Changjie Tang; Kaikuo Xu; Qian Luo

Conference Proceedings

Web contents extracting for web-based learning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 5145 LNCS 59-68

DOI: 10.1007/978-3-540-85033-5_7

0Citations

7Readers

Get full text

Abstract

Web mining has been applied to improve web-based learning. Content-based Web mining usually focuses on main contents of web page. This paper proposes a novel approach to automatically extract main contents from web pages. Compared with existed studies, the method may determine whether a web page contains main contents, and then extracts main contents without using DOM-Tree and template. Main contributions include: (1) Introducing a new concept of Block and proposing a method to partition web page to blocks. Main contents and noise contents may be well partitioned into different blocks. (2) Introducing a concept of Web Page Block Distribution and studying its feature. Based on Block Distribution, we may effectively determine whether the web page contain main contents, and then extract main contents via outlier analysis. Experiments demonstrate utility and feasibility of the method. © 2008 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Qiu, J., Tang, C., Xu, K., & Luo, Q. (2008). Web contents extracting for web-based learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5145 LNCS, pp. 59–68). https://doi.org/10.1007/978-3-540-85033-5_7

Web contents extracting for web-based learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions