Implementing a Statistical Parametric Speech Synthesis System for a Patient with Laryngeal Cancer

Krzysztof Szklanny; Jakub Lachowicz

Journal ArticleOPEN ACCESS

Implementing a Statistical Parametric Speech Synthesis System for a Patient with Laryngeal Cancer

Sensors (2022) 22(9)

DOI: 10.3390/s22093188

4Citations

8Readers

Abstract

Total laryngectomy, i.e., the surgical removal of the larynx, has a profound influence on a patient’s quality of life. The procedure results in a loss of natural voice, which in effect constitutes a significant socio-psychological problem for the patient. The main aim of the study was to develop a statistical parametric speech synthesis system for a patient with laryngeal cancer, on the basis of the patient’s speech samples recorded shortly before the surgery and to check if it was possible to generate speech quality close to that of the original recordings. The recording made use of a representative corpus of the Polish language, consisting of 2150 sentences. The recorded voice proved to indicate dysphonia, which was confirmed by the auditory-perceptual RBH scale (roughness, breathiness, hoarseness) and by acoustical analysis using AVQI (The Acoustic Voice Quality Index). The speech synthesis model was trained using the Merlin repository. Twenty-five experts participated in the MUSHRA listening tests, rating the synthetic voice at 69.4 in terms of the professional voice-over talent recording, on a 0–100 scale, which is a very good result. The authors compared the quality of the synthetic voice to another model of synthetic speech trained with the same corpus, but where a voice-over talent provided the recorded speech samples. The same experts rated the voice at 63.63, which means the patient’s synthetic voice with laryngeal cancer obtained a higher score than that of the talent-voice recordings. As such, the method enabled for the creation of a statistical parametric speech synthesizer for patients awaiting total laryngectomy. As a result, the solution would improve the quality of life as well as better mental wellbeing of the patient.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Szklanny, K., & Lachowicz, J. (2022). Implementing a Statistical Parametric Speech Synthesis System for a Patient with Laryngeal Cancer. Sensors, 22(9). https://doi.org/10.3390/s22093188

Readers' Seniority

Researcher 1

100%

Readers' Discipline

Medicine and Dentistry 1

25%

Computer Science 1

25%

Arts and Humanities 1

25%

Psychology 1

25%

Implementing a Statistical Parametric Speech Synthesis System for a Patient with Laryngeal Cancer

Abstract

Author supplied keywords

References Powered by Scopus

Software for a cascade/parallel formant synthesizer

A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques: Guideline elaborated by the Committee on Phoniatrics of the European Laryngological Society (ELS)

WORLD: A vocoder-based high-quality speech synthesis system for real-time applications

Cited by Powered by Scopus

Pareto-Optimized AVQI Assessment of Dysphonia: A Clinical Trial Using Various Smartphones

Analytics and Applications of Audio and Image Sensing Techniques

Cloning the voice and speech of Piotr Fronczewski for Polish speech synthesis

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline