Automatic speech recognition (ASR) task constitutes a well-known issue among fields like Natural Language Processing (NLP), Digital Signal Processing (DSP) and Machine Learning (ML). In this work, a robust supervised classification model is presented (MFCCs + autocor + SVM) for feature extraction of solo speech signals. Mel Frequency Cepstral Coefficients (MFCCs) are exploited combined with Content Based Image Retrieval (CBIR) features extracted from spectrogram produced by each frame of the speech signal. Improvement of classification accuracy using such extended feature vectors is examined against using only MFCCs with several classifiers for three scenarios of different number of speakers.
CITATION STYLE
Karlos, S., Fazakis, N., Karanikola, K., Kotsiantis, S., & Sgarbas, K. (2016). Speech recognition combining MFCCs and image features. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9811 LNCS, pp. 651–658). Springer Verlag. https://doi.org/10.1007/978-3-319-43958-7_79
Mendeley helps you to discover research relevant for your work.