This document describes the realization of a spoken information retrieval system and its application to words search in an indexed video database. The system uses an automatic speech recognition (ASR) software to convert the audio signal of a video file into a transcript file and then a document indexing tool to index this transcripted file. Then, a spoken query, uttered by any user, is presented to the ASR to decode the audio signal and propose a hypothesis that is later used to formulate a query to the indexed database. The final outcome of the system is a list of video frame tags containing the audio correspondent to the spoken query. The speech recognition system achieved less than 15% Word Error Rate (WER) and its combined operation with the document indexing system showed outstanding performance with spoken queries. © Springer-Verlag 2004.
CITATION STYLE
Salgado-Garza, L. R., & Nolazco-Flores, J. A. (2004). On the use of automatic speech recognition for spoken information retrieval from video databases. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3287, 381–385. https://doi.org/10.1007/978-3-540-30463-0_47
Mendeley helps you to discover research relevant for your work.