Feature selection for language independent text forum summarization

Vladislav A. Grozin; Natalia F. Gusarova; Natalia V. Dobrenko

Conference Proceedings

Feature selection for language independent text forum summarization

Communications in Computer and Information Science (2015) 518 63-71

DOI: 10.1007/978-3-319-24543-0_5

6Citations

7Readers

Get full text

Abstract

Nowadays the need for multilingual information retrieval for searching relevant information is rising steadily. Specialized text-based forums on the Web are a valuable source of such information. However, extraction of informative messages is often hindered by large amount of non-informative posts (the so-called offtopic posts) and informal language commonly used on forums. The paper deals with the task of automatic identification of posts potentially useful for sharing professional experience within text forums irrespective of the forum’s language. For our experiments we have selected subsets from various text forums containing different languages. Manual markup was held by native speaking experts. Textual, thread-based, and social graph features were extracted. In order to select satisfactory language-independent forum features we used gradient boosting models, relative influence metric for model analysis, and NDCG metric for measuring selection method quality. We have formed a satisfactory set of forum features indicating the post’s utility which do not demand sophisticated linguistic analysis and is suitable for practical use.

Cite

CITATION STYLE

APA

Grozin, V. A., Gusarova, N. F., & Dobrenko, N. V. (2015). Feature selection for language independent text forum summarization. In Communications in Computer and Information Science (Vol. 518, pp. 63–71). Springer Verlag. https://doi.org/10.1007/978-3-319-24543-0_5

Feature selection for language independent text forum summarization

Abstract

Cite

Register to see more suggestions