Characterizing user-generated text content mining: A systematic mapping study of the portuguese language

Ellen Souza; Dayvid Castro; Douglas Vitório; Ingryd Teles; Adriano L.I. Oliveira; Cristine Gusmão

Conference Proceedings

Characterizing user-generated text content mining: A systematic mapping study of the portuguese language

Advances in Intelligent Systems and Computing (2016) 444 1015-1024

DOI: 10.1007/978-3-319-31232-3_96

4Citations

19Readers

Get full text

Abstract

Unstructured data accounts for more than 80% of enterprise data and is growing at an annual exponential rate of 60%. Text mining refers to the process of discovering new, previously unknown and potentially useful information from a variety of unstructured data including user-generated text content (UGTC). Given that Portuguese language is one of the most common languages in the world, and it is also the second most frequent language on Twitter, the goal of this work is to plot the landscape of current studies that relates the application of text mining to UGTC in the Portuguese language. The systematic mapping review method was applied to search, select, and to extract data from the included studies. Our manual and automated searches retrieved 6075 studies up to year 2014, from which 35 were included in the study. Text classification concentrates 79% of all text mining tasks, having the Naïve Bayes as the main classifier and Twitter as the main data source.

Author supplied keywords

Cite

CITATION STYLE

APA

Souza, E., Castro, D., Vitório, D., Teles, I., Oliveira, A. L. I., & Gusmão, C. (2016). Characterizing user-generated text content mining: A systematic mapping study of the portuguese language. In Advances in Intelligent Systems and Computing (Vol. 444, pp. 1015–1024). Springer Verlag. https://doi.org/10.1007/978-3-319-31232-3_96

Characterizing user-generated text content mining: A systematic mapping study of the portuguese language

Abstract

Author supplied keywords

Cite

Register to see more suggestions