Wav2vec2-based Paralinguistic Systems to Recognise Vocalised Emotions and Stuttering

Tamas Grosz; Dejan Porjazovski; Yaroslav Getman; Sudarsana Kadiri; Mikko Kurimo

Conference ProceedingsOPEN ACCESS

Wav2vec2-based Paralinguistic Systems to Recognise Vocalised Emotions and Stuttering

MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia (2022) 7026-7029

DOI: 10.1145/3503161.3551572

21Citations

7Readers

Abstract

With the rapid advancement in automatic speech recognition and natural language understanding, a complementary field (paralinguistics) emerged, focusing on the non-verbal content of speech. The ACM Multimedia 2022 Computational Paralinguistics Challenge introduced several exciting tasks of this field. In this work, we focus on tackling two Sub-Challenges using modern, pre-trained models called wav2vec2. Our experimental results demonstrated that wav2vec2 is an excellent tool for detecting the emotions behind vocalisations and recognising different types of stutterings. Albeit they achieve outstanding results on their own, our results demonstrated that wav2vec2-based systems could be further improved by ensembling them with other models. Our best systems outperformed the competition baselines by a considerable margin, achieving an unweighted average recall of 44.0 (absolute improvement of 6.6% over baseline) on the Vocalisation Sub-Challenge and 62.1 (absolute improvement of 21.7% over baseline) on the Stuttering Sub-Challenge.

Author supplied keywords

Cite

CITATION STYLE

APA

Grosz, T., Porjazovski, D., Getman, Y., Kadiri, S., & Kurimo, M. (2022). Wav2vec2-based Paralinguistic Systems to Recognise Vocalised Emotions and Stuttering. In MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia (pp. 7026–7029). Association for Computing Machinery, Inc. https://doi.org/10.1145/3503161.3551572

Wav2vec2-based Paralinguistic Systems to Recognise Vocalised Emotions and Stuttering

Abstract

Author supplied keywords

Cite

Register to see more suggestions