Review of non-english corpora annotated for emotion classification in text

Viktorija Leonova

Conference Proceedings

Review of non-english corpora annotated for emotion classification in text

Leonova V

Communications in Computer and Information Science (2020) 1243 CCIS 96-108

DOI: 10.1007/978-3-030-57672-1_8

3Citations

6Readers

Get full text

Abstract

In this paper we try to systematize the information about the available corpora for emotion classification in text for languages other than English with the goal to find what approaches could be used for low-resource languages with close to no existing works in the field. We analyze the corresponding volume, emotion classification schema, language of each corresponding corpus and methods employed for data preparation and annotation automation. We’ve systematized twenty-four papers representing the corpora and found that corpora were mostly for the most spoken world languages: Hindi, Chinese, Turkish, Arabic, Japanese etc. A typical corpus contained several thousand of manually-annotated entries, collected from a social network, annotated by three annotators each and was processed by a few machine learning methods, such as linear SVM and Naïve Bayes and (more recent ones) a couple of neural networks methods, such as CNN.

Author supplied keywords

Cite

CITATION STYLE

APA

Leonova, V. (2020). Review of non-english corpora annotated for emotion classification in text. In Communications in Computer and Information Science (Vol. 1243 CCIS, pp. 96–108). Springer. https://doi.org/10.1007/978-3-030-57672-1_8

Review of non-english corpora annotated for emotion classification in text

Abstract

Author supplied keywords

Cite

Register to see more suggestions