Huge annotated corpora are relevant for many Natural Language Processing tasks such as Sentiment Analysis. However, a manual and more precise annotation is always costly and becomes prohibitive when the corpus is too large. This paper presents a semi-supervised learning based framework for extending sentiment annotated corpora with unlabeled data, named CasSUL. The framework was used to extend in eight times TTsBR, a corpus of 15.000 tweets in Brazilian Portuguese manually annotated in three polarity classes. The extended annotated corpus was used to train several polarity classifiers and the results show that some combinations of classifier and features can preserve the annotation quality of the original corpus in the resulting corpus.
CITATION STYLE
Brum, H. B., & Nunes, M. das G. V. (2018). Semi-supervised Sentiment Annotation of Large Corpora. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11122 LNAI, pp. 385–395). Springer Verlag. https://doi.org/10.1007/978-3-319-99722-3_39
Mendeley helps you to discover research relevant for your work.