Exploring Textual Features for Multi-label Classification of Portuguese Film Synopses

Giuseppe Portolese; Marcos Aurélio Domingues; Valéria Delisandra Feltrim

Conference Proceedings

Exploring Textual Features for Multi-label Classification of Portuguese Film Synopses

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11805 LNAI 669-681

DOI: 10.1007/978-3-030-30244-3_55

5Citations

7Readers

Get full text

Abstract

The multi-label classification of film genres by using features extracted from their synopses has recently gained some attention from the scientific community, however, the number of studies is still limited. These studies are even scarcer for languages other than English. In this work we present the P-TMDb dataset, which contains 13, 394 Portuguese film synopses, and explore the film genre classification by experimenting with nine different groups of textual features and four multi-label algorithms. As our dataset is unbalanced, we also conducted experiments with an oversampled version of the dataset. The best result obtained for the original dataset was achieved by a TF-IDF based classifier, presenting an average F1 score of 0.478, while the best result for the oversampled dataset was achieved by a combination of several feature groups and presented an average F1 score of 0.611.

Author supplied keywords

Cite

CITATION STYLE

APA

Portolese, G., Domingues, M. A., & Feltrim, V. D. (2019). Exploring Textual Features for Multi-label Classification of Portuguese Film Synopses. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11805 LNAI, pp. 669–681). Springer Verlag. https://doi.org/10.1007/978-3-030-30244-3_55

Exploring Textual Features for Multi-label Classification of Portuguese Film Synopses

Abstract

Author supplied keywords

Cite

Register to see more suggestions