JSUT and JVS: Free Japanese voice corpora for accelerating speech synthesis research

Shinnosuke Takamichi; Ryosuke Sonobe; Kentaro Mitsui; Yuki Saito; Tomoki Koriyama; Naoko Tanji; Hiroshi Saruwatari

Journal ArticleOPEN ACCESS

JSUT and JVS: Free Japanese voice corpora for accelerating speech synthesis research

Acoustical Science and Technology (2020) 41(5) 761-768

DOI: 10.1250/ast.41.761

27Citations

8Readers

Abstract

In this paper, we develop two corpora for speech synthesis research. Thanks to improvements in machine learning techniques, including deep learning, speech synthesis is becoming a machine learning task. To accelerate speech synthesis research, we aim at developing Japanese voice corpora reasonably accessible from not only academic institutions but also commercial companies. In this paper, we construct the JSUT and JVS corpora. They are designed mainly for text-to-speech synthesis and voice conversion, respectively. The JSUT corpus contains 10 hours of reading-style speech uttered by a single speaker, and the JVS corpus contains 30 hours containing three styles of speech uttered by 100 speakers. This paper describes how we designed the corpora and summarizes the specifications. The corpora are available at our project pages.

Author supplied keywords

Cite

CITATION STYLE

APA

Takamichi, S., Sonobe, R., Mitsui, K., Saito, Y., Koriyama, T., Tanji, N., & Saruwatari, H. (2020). JSUT and JVS: Free Japanese voice corpora for accelerating speech synthesis research. Acoustical Science and Technology, 41(5), 761–768. https://doi.org/10.1250/ast.41.761

JSUT and JVS: Free Japanese voice corpora for accelerating speech synthesis research

Abstract

Author supplied keywords

Cite

Register to see more suggestions