A modulation-demodulation model of speech communication

2Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Perceptual invariance against a large amount of acoustic variability in speech has been a long-discussed question in speech science and engineering [1] and it is still an open question [2, 3]. Recently, we proposed a candidate answer to it based on mathematically-guaranteed relational invariance [4, 5]. Here, completely transform-invariant features, f-divergences, are extracted from speech dynamics of an utterance and they are used to represent that utterance. In this paper, this representation is interpreted from a viewpoint of telecommunications and evolutionary anthropology. Speech production is often regarded as a process of modulating the baseline timbre of a speaker's voices by manipulating the vocal organs, i.e., spectrum modulation. Then, extraction of the linguistic content from an utterance can be viewed as a process of spectrum demodulation. This modulation-demodulation model of speech communication has a good link to known morphological and cognitive differences between humans and apes. The model also claims that a linguistic content is transmitted mainly by supra-segmental features.

Cite

CITATION STYLE

APA

Minematsu, N. (2010). A modulation-demodulation model of speech communication. In Proceedings of the International Conference on Speech Prosody. International Speech Communication Association. https://doi.org/10.21437/speechprosody.2010-113

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free