Explosive growing and ubiquitous accessing of digital music encourage the need for the content analysis of music objects. Music emotion, as a significant component of affective content, convoy high-level semantics of music objects. Provided with proper features, music emotion recognition could be formulated as a regression problem or a training/classification process, in which the emotions are represented as vectors in the dimensional space or categorical tags, respectively. In this paper, we would like to propose a machine learning-based approach to predict dimensional emotion of music objects without a sophisticated feature extraction process. Meanwhile, the exponential frequency resolution of Constant-Q Transform mirrors the human auditory system. First, we apply the Constant-Q Transform on music objects to derive the spectrogram, which is the corresponding visual representation of audio signals. Then, we make use of the convolutional neural network, which is commonly and successfully applied to analyze the visual image, on spectrogram for predicting music emotion. Experimental results show that our approach is promising and effective. By using the ten-fold cross-validation, our approach achieves the Top-3 accuracy as high as $$82.24\%$$ in the valence dimension and $$81.80\%$$ in the arousal dimension.
CITATION STYLE
Yang, P. T., Kuang, S. M., Wu, C. C., & Hsu, J. L. (2020). Predicting music emotion by using convolutional neural network. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12204 LNCS, pp. 266–275). Springer. https://doi.org/10.1007/978-3-030-50341-3_21
Mendeley helps you to discover research relevant for your work.