Transfer Learning in Speaker’s Age and Gender Recognition

10Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we study an application of transfer learning approach to speaker’s age and gender recognition task. Recently, speech analysis systems, which take images of log Mel-spectrograms or MFCCs as input for classification, are gaining popularity. Therefore, we used pretrained models that showed good performance on ImageNet task, such as AlexNet, VGG-16, ResNet18, ResNet34, ResNet50, as well as state-of-the-art EfficientNet-B4 from Google. Additionally, we trained 1D CNN and TDNN models for speaker’s age and gender recognition. We compared performance of these models in age (4 classes), gender (3 classes) and joint age and gender (7 classes) recognition. Despite high performance of pretrained models in ImageNet task, our TDNN models showed better UAR results in all tasks presented in this study: age (UAR = 51.719%), gender (UAR = 81.746%) and joint age and gender (UAR = 48.969%) recognition.

Cite

CITATION STYLE

APA

Markitantov, M. (2020). Transfer Learning in Speaker’s Age and Gender Recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12335 LNAI, pp. 326–335). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60276-5_32

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free