The aim of this paper is to categorize movies into genres using the previews. Our study attempts to combine audio, visual and text features to classify a collection of movie previews into action, biography, comedy, and horror. For each of the collected previews, the audio and visual features are extracted and the text features are drawn from social tags via social websites. The probabilistic latent semantic analysis (PLSA) is used to incorporate the features from these three different aspects of information. The standard PLSA processes one type of information only. Therefore double-model and triple-model PLSAs are extended in order to combine two or three different types of information. We compare these various variants of PLSA approaches with unimodal PLSAs, which use either audio, visual or text features only. The experimental results show not only that one of the triple-model PLSAs achieves the highest accuracy, but also that social tags (text features) play an important role for classifying movies genres.
CITATION STYLE
Hong, H. Z., & Hwang, J. I. G. (2015). Multimodal PLSA for movie genre classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9132, pp. 159–167). Springer Verlag. https://doi.org/10.1007/978-3-319-20248-8_14
Mendeley helps you to discover research relevant for your work.