Deep neural networks for shimmer approximation in synthesized audio signal

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Shimmer is a classical acoustic measure of the amplitude perturbation in a signal. This kind of variation in the human voice allows to characterize some properties, not only of the voice itself, but of the person who speaks. During the last years deep learning techniques have become the state of the art for recognition tasks on the voice. In this work the relationship between shimmer and deep neural networks is analyzed. A deep learning model is created. It is able to approximate shimmer value of a simple synthesized audio signal (stationary and without formants) taking the spectrogram as input feature. It is concluded firstly, that for this kind of synthesized signal, a neural network like the one we proposed can approximate shimmer, and secondly, that the convolution layers can be designed in order to preserve the information of shimmer and transmit it to the following layers.

Cite

CITATION STYLE

APA

García, M. A., & Destéfanis, E. A. (2018). Deep neural networks for shimmer approximation in synthesized audio signal. In Communications in Computer and Information Science (Vol. 790, pp. 3–12). Springer Verlag. https://doi.org/10.1007/978-3-319-75214-3_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free