This paper proposes a novel machine learning approach for the task of on-line continuous-time music mood regression, i.e., low- latency prediction of the time-varying arousal and valence in musical pieces. On the front-end, a large set of segmental acoustic features is extracted to model short-term variations. Then, multi-variate re- gression is performed by deep recurrent neural networks to model longer-range context and capture the time-varying emotional profile of musical pieces appropriately. Evaluation is done on the 2013 MediaEval Challenge corpus consisting of 1 000 pieces annotated in continous time and continuous arousal and valence by crowd- sourcing. In the result, recurrent neural networks outperform SVR and feedforward neural networks both in continuous-time and static music mood regression, and achieve an R2 of up to .70 and .50 with arousal and valence annotations.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below