Multimodal music mood classification by fusion of audio and lyrics

24Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Mood analysis from music data attracts both increasing research and application attentions in recent years. In this paper, we propose a novel multimodal approach for music mood classification incorporating audio and lyric information, which consists of three key components: 1) lyric feature extraction with a recursive hierarchical deep learning model, preceded by lyric filtering with discriminative reduction of vocabulary and synonymous lyric expansion; 2) saliency based audio feature extraction; 3) a Hough forest based fusion and classification scheme that fuses two modalities at the more fine-grained sentence level, utilizing the time alignment cross modalities. The effectiveness of the proposed model is verified by the experiments on a real dataset containing more than 3000 minutes of music.

Cite

CITATION STYLE

APA

Xue, H., Xue, L., & Su, F. (2015). Multimodal music mood classification by fusion of audio and lyrics. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8936, pp. 26–37). Springer Verlag. https://doi.org/10.1007/978-3-319-14442-9_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free