General audio consists of a wide range of sound phenomena such as music, sound effects, environmental sounds, speech and nonspeech utterances. The sound recognition tools provide a means for classifying and querying such diverse audio content using probabilistic models. This chapter gives an overview of the tools and discusses applications to automatic content classification and content-based searching.
The sound classification and indexing tools are organized into low-level descriptors (LLD), AudioSpectrumBasis and AudioSpectrumProjection, and high-level description schemes (DSs), SoundModel and SoundClassificationModel, which are based on the ContinuousHiddenMarkovModel and ProbabilityClassificationModel DSs defined in the Multimedia Description Schemes (MDS) document. The tools provide for two broad types of sound description; text-based description by class labels and quantitative description using probabilistic models. Class labels are called terms and they provide qualitative information about sound content. Terms are organized into classification schemes, or taxonomies, such as music genres or sound effects. Descriptions in this form are suitable for text-based query applications, such as Internet search engines, or any processing tool that uses text fields. In contrast, the quantitative descriptors consist of compact mathematical information about an audio segment and may be used for numerical evaluation of sound similarity. These latter descriptors are used for audio query-by-example (QBE) applications. They can be applied to many different sound types because of the generality of the low-level features. We start by discussing these LLD.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below