Semi-supervised Sentiment Annotation of Large Corpora

6Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Huge annotated corpora are relevant for many Natural Language Processing tasks such as Sentiment Analysis. However, a manual and more precise annotation is always costly and becomes prohibitive when the corpus is too large. This paper presents a semi-supervised learning based framework for extending sentiment annotated corpora with unlabeled data, named CasSUL. The framework was used to extend in eight times TTsBR, a corpus of 15.000 tweets in Brazilian Portuguese manually annotated in three polarity classes. The extended annotated corpus was used to train several polarity classifiers and the results show that some combinations of classifier and features can preserve the annotation quality of the original corpus in the resulting corpus.

Cite

CITATION STYLE

APA

Brum, H. B., & Nunes, M. das G. V. (2018). Semi-supervised Sentiment Annotation of Large Corpora. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11122 LNAI, pp. 385–395). Springer Verlag. https://doi.org/10.1007/978-3-319-99722-3_39

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free