Predicting sentiment of Polish language short texts

4Citations
Citations of this article
62Readers
Mendeley users who have this article in their library.

Abstract

The goal of this paper is to use all available Polish language data sets to seek the best possible performance in supervised sentiment analysis of short texts. We use text collections with labeled sentiment such as tweets, movie reviews and a sentiment treebank, in three comparison modes. In the first, we examine the performance of models trained and tested on the same text collection using standard cross-validation (in-domain). In the second we train models on all available data except the given test collection, which we use for testing (one vs rest cross-domain). In the third, we train a model on one data set and apply it to another one (one vs one cross-domain). We compare wide range of methods including machine learning on bag-of-words representation, bidirectional recurrent neural networks as well as the most recent pre-trained architectures ELMO and BERT. We formulate conclusions as to cross-domain and in-domain performance of each method. Unsurprisingly, BERT turned out to be a strong performer, especially in the cross-domain setting. What is surprising however, is solid performance of the relatively simple multinomial Naive Bayes classifier, which performed equally well as BERT on several data sets.

Cite

CITATION STYLE

APA

Wawer, A., & Sobiczewska, J. (2019). Predicting sentiment of Polish language short texts. In International Conference Recent Advances in Natural Language Processing, RANLP (Vol. 2019-September, pp. 1321–1327). Incoma Ltd. https://doi.org/10.26615/978-954-452-056-4_151

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free