This paper presents a conversational speech recognition system able to operate in non-stationary reverberated environments. The system is composed of a dereverberation front-end exploiting multiple distant microphones, and a speech recognition engine. The dereverberation front-end identifies a room impulse response by means of a blind channel identification stage based on the Unconstrained Normalized Multi-Channel Frequency Domain Least Mean Square algorithm. The dereverberation stage is based on the adaptive inverse filter theory and uses the identified responses to obtain a set of inverse filters which are then exploited to estimate the clean speech. The speech recognizer is based on tied-state cross-word triphone models and decodes features computed from the dereverberated speech signal. Experiments conducted on the Buckeye corpus of conversational speech report a relative word accuracy improvement of 17.48% in the stationary case and of 11.16% in the non-stationary one. © 2012 Springer-Verlag.
CITATION STYLE
Rotili, R., Principi, E., Wöllmer, M., Squartini, S., & Schuller, B. (2012). Cognitive Behavioural Systems. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7403(FEBRUARY), 50–59. Retrieved from http://www.scopus.com/inward/record.url?eid=2-s2.0-84870312972&partnerID=tZOtx3y1
Mendeley helps you to discover research relevant for your work.