The aim of this paper was to discover what combination of audio features gives the best performance with music emotion detection. Emotion recognition was treated as a regression problem, and a two-dimensional valence-arousal model was used to measure emotions in music. Features extracted by Essentia and Marsyas, tools for audio analysis and audio-based music information retrieval, were used. The influence of different feature sets was examined-low level, rhythm, tonal, and their combination-on arousal and valence prediction. The use of a combination of different types of features significantly improves the results compared with using just one group of features. Features particularly dedicated to the detection of arousal and valence separately, as well as features useful in both cases, were found and presented. This paper presents also the process of building emotion maps of musical compositions. The obtained emotion maps provide new knowledge about the distribution of emotions in an examined audio recording. They reveal new knowledge that had only been available to music experts until this point. ARTICLE HISTORY
CITATION STYLE
Grekow, J. (2018). Audio features dedicated to the detection and tracking of arousal and valence in musical compositions. Journal of Information and Telecommunication, 2(3), 322–333. https://doi.org/10.1080/24751839.2018.1463749
Mendeley helps you to discover research relevant for your work.