Query-by-Example Speech Search Using Recurrent Neural Acoustic Word Embeddings with Temporal Context

12Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Acoustic word embeddings (AWEs) have been popular in low-resource query-by-example speech search. They are using vector distances to find the spoken query in search content, which has much lower computation than the conventional dynamic time warping (DTW)-based approaches. The AWE networks are usually trained using variable-length isolated spoken words, while they are applied to fixed-length speech segments obtained by shifting an analysis window on speech content. There is an obvious mismatch between the learning of AWEs and its application on search content. To mitigate such mismatch, we propose to include temporal context information on spoken word pairs to learn recurrent neural AWEs. More specifically, the spoken word pairs are represented by multi-lingual bottleneck features (BNFs) and padded with the neighboring frames of the target spoken words to form fixed-length speech segment pairs. A deep bidirectional long short-term memory (BLSTM) network is then trained with a triplet loss using the speech segment pairs. Recurrent neural AWEs are obtained by concatenating the BLSTM backward and forward outputs. During QbE speech search stage, both spoken query and search content are converted into recurrent neural AWEs. Cosine distances are then measured between them to find the spoken query. The experiments show that using temporal context is essential to alleviate the mismatch. The proposed recurrent neural AWEs trained with temporal context outperform the previous state-of-art features with 12.5% relative mean average precision (MAP) improvement on QbE speech search.

Cite

CITATION STYLE

APA

Yuan, Y., Leung, C. C., Xie, L., Chen, H., & Ma, B. (2019). Query-by-Example Speech Search Using Recurrent Neural Acoustic Word Embeddings with Temporal Context. IEEE Access, 7, 67656–67665. https://doi.org/10.1109/ACCESS.2019.2918638

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free