Complete-thread extraction from web forums

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper proposes an effective algorithm which can automatically extract all meta-information of threads from various forums. The algorithm contains two steps: thread extraction from board pages and detailed information extraction from thread pages. In the thread extraction step, the board pages are divided into five types according to their structure, and corresponding extraction algorithms and models are suggested. In the second step, an effective method is applied to identify the content of the origin post, other un-extracted fields of the origin post which are always located around the content are matched by regular patterns, and a model is trained to extract the reply posts. The experiment shows that the proposed algorithm is accurate and effective. © 2012 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Hu, F., Ruan, T., & Shao, Z. (2012). Complete-thread extraction from web forums. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7235 LNCS, pp. 727–734). https://doi.org/10.1007/978-3-642-29253-8_70

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free