Automatic identification of activities can be used to provide information to caregivers of persons with dementia for identifying assistance needs. Environmental audio provides significant and representative information of the context, making microphones a choice to identify activities automatically. However, in real situations, the audio captured by microphones comes from overlapping sound sources, making its identification a challenge for audio analysis and retrieval. In this paper we propose a succinct representation of the signal by measuring the multiband spectral entropy of the signal frame by frame, followed by a cosine transform and binary codification, we call this the Cosine Multi-Band Spectral Entropy Signature (CMBSES). To test our proposal, we created a database of a mix-up of triples from a collection of nine environmental sounds in four different signal-to-noise ratios (SNR). We codified both the original sounds and the triples and then searched all the original sounds in the mix-up collection. To establish a ground truth we also tested the same database with 48 people of assorted ages. Our feature extraction outperforms the state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) and it also surpass humans in the experiment. © 2012 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Beltrán-Márquez, J., Chávez, E., & Favela, J. (2012). Environmental sound recognition by measuring significant changes in the spectral entropy. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7329 LNCS, pp. 334–343). https://doi.org/10.1007/978-3-642-31149-9_34
Mendeley helps you to discover research relevant for your work.