Semi-supervised Sentiment Annotation of Large Corpora

Henrico Bertini Brum; Maria das Graças Volpe Nunes

Conference Proceedings

Semi-supervised Sentiment Annotation of Large Corpora

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11122 LNAI 385-395

DOI: 10.1007/978-3-319-99722-3_39

6Citations

9Readers

Get full text

Abstract

Huge annotated corpora are relevant for many Natural Language Processing tasks such as Sentiment Analysis. However, a manual and more precise annotation is always costly and becomes prohibitive when the corpus is too large. This paper presents a semi-supervised learning based framework for extending sentiment annotated corpora with unlabeled data, named CasSUL. The framework was used to extend in eight times TTsBR, a corpus of 15.000 tweets in Brazilian Portuguese manually annotated in three polarity classes. The extended annotated corpus was used to train several polarity classifiers and the results show that some combinations of classifier and features can preserve the annotation quality of the original corpus in the resulting corpus.

Author supplied keywords

Cite

CITATION STYLE

APA

Brum, H. B., & Nunes, M. das G. V. (2018). Semi-supervised Sentiment Annotation of Large Corpora. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11122 LNAI, pp. 385–395). Springer Verlag. https://doi.org/10.1007/978-3-319-99722-3_39

Semi-supervised Sentiment Annotation of Large Corpora

Abstract

Author supplied keywords

Cite

Register to see more suggestions