Query-by-Example Speech Search Using Recurrent Neural Acoustic Word Embeddings with Temporal Context

Yougen Yuan; Cheung Chi Leung; Lei Xie; Hongjie Chen; Bin Ma

Journal ArticleOPEN ACCESS

Query-by-Example Speech Search Using Recurrent Neural Acoustic Word Embeddings with Temporal Context

IEEE Access (2019) 7 67656-67665

DOI: 10.1109/ACCESS.2019.2918638

12Citations

10Readers

Abstract

Acoustic word embeddings (AWEs) have been popular in low-resource query-by-example speech search. They are using vector distances to find the spoken query in search content, which has much lower computation than the conventional dynamic time warping (DTW)-based approaches. The AWE networks are usually trained using variable-length isolated spoken words, while they are applied to fixed-length speech segments obtained by shifting an analysis window on speech content. There is an obvious mismatch between the learning of AWEs and its application on search content. To mitigate such mismatch, we propose to include temporal context information on spoken word pairs to learn recurrent neural AWEs. More specifically, the spoken word pairs are represented by multi-lingual bottleneck features (BNFs) and padded with the neighboring frames of the target spoken words to form fixed-length speech segment pairs. A deep bidirectional long short-term memory (BLSTM) network is then trained with a triplet loss using the speech segment pairs. Recurrent neural AWEs are obtained by concatenating the BLSTM backward and forward outputs. During QbE speech search stage, both spoken query and search content are converted into recurrent neural AWEs. Cosine distances are then measured between them to find the spoken query. The experiments show that using temporal context is essential to alleviate the mismatch. The proposed recurrent neural AWEs trained with temporal context outperform the previous state-of-art features with 12.5% relative mean average precision (MAP) improvement on QbE speech search.

Author supplied keywords

Cite

CITATION STYLE

APA

Yuan, Y., Leung, C. C., Xie, L., Chen, H., & Ma, B. (2019). Query-by-Example Speech Search Using Recurrent Neural Acoustic Word Embeddings with Temporal Context. IEEE Access, 7, 67656–67665. https://doi.org/10.1109/ACCESS.2019.2918638

Query-by-Example Speech Search Using Recurrent Neural Acoustic Word Embeddings with Temporal Context

Abstract

Author supplied keywords

Cite

Register to see more suggestions