Transfer Learning in Speaker’s Age and Gender Recognition

Maxim Markitantov

Conference Proceedings

Transfer Learning in Speaker’s Age and Gender Recognition

Markitantov M

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12335 LNAI 326-335

DOI: 10.1007/978-3-030-60276-5_32

10Citations

5Readers

Get full text

Abstract

In this paper, we study an application of transfer learning approach to speaker’s age and gender recognition task. Recently, speech analysis systems, which take images of log Mel-spectrograms or MFCCs as input for classification, are gaining popularity. Therefore, we used pretrained models that showed good performance on ImageNet task, such as AlexNet, VGG-16, ResNet18, ResNet34, ResNet50, as well as state-of-the-art EfficientNet-B4 from Google. Additionally, we trained 1D CNN and TDNN models for speaker’s age and gender recognition. We compared performance of these models in age (4 classes), gender (3 classes) and joint age and gender (7 classes) recognition. Despite high performance of pretrained models in ImageNet task, our TDNN models showed better UAR results in all tasks presented in this study: age (UAR = 51.719%), gender (UAR = 81.746%) and joint age and gender (UAR = 48.969%) recognition.

Author supplied keywords

Cite

CITATION STYLE

APA

Markitantov, M. (2020). Transfer Learning in Speaker’s Age and Gender Recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12335 LNAI, pp. 326–335). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60276-5_32

Transfer Learning in Speaker’s Age and Gender Recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions