Automatic segmentation of spoken word signals into letters based on amplitude variation for speech to text transcription

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper a technique for automatic segmentation of spoken word signals is presented for identifying letters for transcription into textual form. Signal patterns for each letter present in different words have been used for the purpose. Voice signals are obtained by taking pronunciations of 1,000 words available in the standard dictionary. After collecting the signals, pre-processing is performed to reduce the noise taking a heuristically determined threshold value. Then the signals are segmented based on Amplitude Variation (AV) in different portions of the signal, each corresponding to an alphabet in that particular word. Signal Peak Value (SPV) is the feature used for recognizing the letters. Accuracy of the method is estimated using Bagging, Bayes Net, J48, Naive Bayes, PART and SVM classifiers available in Weka. The best and the average classification accuracies obtained in this method are 95.15% (given by J48 classifier) and 86.92%, respectively, which are quite acceptable.

Cite

CITATION STYLE

APA

Roy, A., & Phadikar, S. (2015). Automatic segmentation of spoken word signals into letters based on amplitude variation for speech to text transcription. In Advances in Intelligent Systems and Computing (Vol. 340, pp. 621–628). Springer Verlag. https://doi.org/10.1007/978-81-322-2247-7_63

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free