Dnn-based speech synthesis using dialogue-act information and its evaluation with respect to illocutionary act naturalness

Nobukatsu Hojo; Yusuke Ijima; Hiroaki Sugiyama; Noboru Miyazaki; Takahito Kawanishi; Kunio Kashino

Journal ArticleOPEN ACCESS

Dnn-based speech synthesis using dialogue-act information and its evaluation with respect to illocutionary act naturalness

Transactions of the Japanese Society for Artificial Intelligence (2020) 35(2) 1-17

DOI: 10.1527/tjsai.A-J81

0Citations

6Readers

Abstract

This paper aims at improving naturalness of synthesized speech generated by a text-to-speech (TTS) system within a spoken dialogue system with respect to “how natural the system’s intention is perceived via the synthesized speech”. We call this measure “illocutionary act naturalness” in this paper. To achieve this aim, we propose to utilize dialogue-act (DA) information as an auxiliary feature for a deep neural network (DNN)-based speech synthesis system. First, we construct a speech database with DA tags. Second, we build the proposed DNN-based speech synthesis system based on the database. Then, we evaluate the proposed method by comparing its performance with two conventional hidden Markov model (HMM)-based speech synthesis systems, namely, the style-mixed modeling method and the style adaptation method. The objective evaluation results show that the proposed method overwhelms the style-mixed modeling method in the accuracy of reproduction of global prosodic characteristics of dialogue-acts. They also reveal that the proposed method overwhelms the style adaptation method in the accuracy of reproduction of sentence final tone characteristics of dialogue-acts. The subjective evaluation results also show that the proposed method improves the illocutionary act naturalness compared with the two conventional methods.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Hojo, N., Ijima, Y., Sugiyama, H., Miyazaki, N., Kawanishi, T., & Kashino, K. (2020). Dnn-based speech synthesis using dialogue-act information and its evaluation with respect to illocutionary act naturalness. Transactions of the Japanese Society for Artificial Intelligence, 35(2), 1–17. https://doi.org/10.1527/tjsai.A-J81

Readers' Seniority

Lecturer / Post doc 1

50%

PhD / Post grad / Masters / Doc 1

50%

Readers' Discipline

Linguistics 1

50%

Arts and Humanities 1

50%

Dnn-based speech synthesis using dialogue-act information and its evaluation with respect to illocutionary act naturalness

Abstract

Author supplied keywords

References Powered by Scopus

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds

Speech parameter generation algorithms for HMM-based speech synthesis

Dialogue act modeling for automatic tagging and recognition of conversational speech

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline