Multimodal music mood classification by fusion of audio and lyrics

Hao Xue; Like Xue; Feng Su

Conference Proceedings

Multimodal music mood classification by fusion of audio and lyrics

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 8936 26-37

DOI: 10.1007/978-3-319-14442-9_3

24Citations

25Readers

Get full text

Abstract

Mood analysis from music data attracts both increasing research and application attentions in recent years. In this paper, we propose a novel multimodal approach for music mood classification incorporating audio and lyric information, which consists of three key components: 1) lyric feature extraction with a recursive hierarchical deep learning model, preceded by lyric filtering with discriminative reduction of vocabulary and synonymous lyric expansion; 2) saliency based audio feature extraction; 3) a Hough forest based fusion and classification scheme that fuses two modalities at the more fine-grained sentence level, utilizing the time alignment cross modalities. The effectiveness of the proposed model is verified by the experiments on a real dataset containing more than 3000 minutes of music.

Author supplied keywords

Cite

CITATION STYLE

APA

Xue, H., Xue, L., & Su, F. (2015). Multimodal music mood classification by fusion of audio and lyrics. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8936, pp. 26–37). Springer Verlag. https://doi.org/10.1007/978-3-319-14442-9_3

Multimodal music mood classification by fusion of audio and lyrics

Abstract

Author supplied keywords

Cite

Register to see more suggestions