Emotional speech acoustic model for Malay: Iterative versus isolated unit training

  • Mustafa M
  • Ainon R
12Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The ability of speech synthesis system to synthesize emotional speech enhances the user's experience when using this kind of system and its related applications. However, the development of an emotional speech synthesis system is a daunting task in view of the complexity of human emotional speech. The more recent state-of-the-art speech synthesis systems, such as the one based on hidden Markov models, can synthesize emotional speech with acceptable naturalness with the use of a good emotional speech acoustic model. However, building an emotional speech acoustic model requires adequate resources including segment-phonetic labels of emotional speech, which is a problem for many under-resourced languages, including Malay. This research shows how it is possible to build an emotional speech acoustic model for Malay with minimal resources. To achieve this objective, two forms of initialization methods were considered: iterative training using the deterministic annealing expectation maximization algorithm and the isolated unit training. The seed model for the automatic segmentation is a neutral speech acoustic model, which was transformed to target emotion using two transformation techniques: model adaptation and context-dependent boundary refinement. Two forms of evaluation have been performed: an objective evaluation measuring the prosody error and a listening evaluation to measure the naturalness of the synthesized emotional speech.

References Powered by Scopus

Vocal communication of emotion: A review of research paradigms

1255Citations
N/AReaders
Get full text

Statistical parametric speech synthesis

1007Citations
N/AReaders
Get full text

Speech parameter generation algorithms for HMM-based speech synthesis

866Citations
N/AReaders
Get full text

Cited by Powered by Scopus

New approach in quantification of emotional intensity from the speech signal: Emotional temperature

61Citations
N/AReaders
Get full text

Continuous tracking of the emotion temperature

9Citations
N/AReaders
Get full text

Code-Switching in Automatic Speech Recognition: The Issues and Future Directions

9Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Mustafa, M. B., & Ainon, R. N. (2013). Emotional speech acoustic model for Malay: Iterative versus isolated unit training. The Journal of the Acoustical Society of America, 134(4), 3057–3066. https://doi.org/10.1121/1.4818741

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 9

69%

Researcher 2

15%

Professor / Associate Prof. 1

8%

Lecturer / Post doc 1

8%

Readers' Discipline

Tooltip

Computer Science 3

30%

Engineering 3

30%

Nursing and Health Professions 2

20%

Medicine and Dentistry 2

20%

Save time finding and organizing research with Mendeley

Sign up for free