Small-parallel exemplar-based voice conversion in noisy environments using affine non-negative matrix factorization

6Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The need to have a large amount of parallel data is a large hurdle for the practical use of voice conversion (VC). This paper presents a novel framework of exemplar-based VC that only requires a small number of parallel exemplars. In our previous work, a VC technique using non-negative matrix factorization (NMF) for noisy environments was proposed. This method requires parallel exemplars (which consist of the source exemplars and target exemplars that have the same texts uttered by the source and target speakers) for dictionary construction. In the framework of conventional Gaussian mixture model (GMM)-based VC, some approaches that do not need parallel exemplars have been proposed. However, in the framework of exemplar-based VC for noisy environments, such a method has never been proposed. In this paper, an adaptation matrix in an NMF framework is introduced to adapt the source dictionary to the target dictionary. This adaptation matrix is estimated using only a small parallel speech corpus. We refer to this method as affine NMF, and the effectiveness of this method has been confirmed by comparing its effectiveness with that of a conventional NMF-based method and a GMM-based method in noisy environments.

Cite

CITATION STYLE

APA

Aihara, R., Fujii, T., Nakashika, T., Takiguchi, T., & Ariki, Y. (2015). Small-parallel exemplar-based voice conversion in noisy environments using affine non-negative matrix factorization. Eurasip Journal on Audio, Speech, and Music Processing, 2015(1), 1–9. https://doi.org/10.1186/s13636-015-0075-4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free