Feature selection for language independent text forum summarization

6Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Nowadays the need for multilingual information retrieval for searching relevant information is rising steadily. Specialized text-based forums on the Web are a valuable source of such information. However, extraction of informative messages is often hindered by large amount of non-informative posts (the so-called offtopic posts) and informal language commonly used on forums. The paper deals with the task of automatic identification of posts potentially useful for sharing professional experience within text forums irrespective of the forum’s language. For our experiments we have selected subsets from various text forums containing different languages. Manual markup was held by native speaking experts. Textual, thread-based, and social graph features were extracted. In order to select satisfactory language-independent forum features we used gradient boosting models, relative influence metric for model analysis, and NDCG metric for measuring selection method quality. We have formed a satisfactory set of forum features indicating the post’s utility which do not demand sophisticated linguistic analysis and is suitable for practical use.

Cite

CITATION STYLE

APA

Grozin, V. A., Gusarova, N. F., & Dobrenko, N. V. (2015). Feature selection for language independent text forum summarization. In Communications in Computer and Information Science (Vol. 518, pp. 63–71). Springer Verlag. https://doi.org/10.1007/978-3-319-24543-0_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free