Juicer: Scalable extraction for thread meta-information of web forum

Yan Guo; Yu Wang; Guodong Ding; Donglin Cao; Gang Zhang; Yi Lv

Conference Proceedings

Juicer: Scalable extraction for thread meta-information of web forum

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5477 143-148

DOI: 10.1007/978-3-642-01393-5_15

2Citations

4Readers

Get full text

Abstract

In Web forum, thread meta-information contained in list-ofthread of board page provide fundamental data for the further forum mining. This paper describes a complete system named Juicer which was developed as a subsystem for an industrial application that involves forum mining. The task of Juicer is to extract thread meta-information from board pages of a great many of large scale online Web forums, which implies that scalable extraction is required with high accuracy and speed, and minimal user effort for maintenance. Among so many existed approaches about information extraction, we can not find any approach to fully satisfy the requirements, so we present simple scalable extraction approach behind Juicer to achieve the goal. Juicer is constituted by four modules: Template generation, Specifying labeling setting, Automatic extraction, Label assignment. Both experiments and practice show that Juicer successfully satisfied the requirements.

Cite

CITATION STYLE

APA

Guo, Y., Wang, Y., Ding, G., Cao, D., Zhang, G., & Lv, Y. (2009). Juicer: Scalable extraction for thread meta-information of web forum. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5477, pp. 143–148). https://doi.org/10.1007/978-3-642-01393-5_15

Juicer: Scalable extraction for thread meta-information of web forum

Abstract

Cite

Register to see more suggestions