Sound engineers need to access vast collections of sound effects for their film and video productions. Sound effects providers rely on text-retrieval techniques to give access to their collections. Currently, audio content is annotated manually, which is an arduous task. Automatic annotation methods, normally fine-tuned to reduced domains such as musical instruments or limited sound effects taxonomies, are not mature enough for labeling with great detail any possible sound. A general sound recognition tool would require first, a taxonomy that represents the world and, second, thousands of classifiers, each specialized in distinguishing little details. We report experimental results on a general sound annotator. To tackle the taxonomy definition problem we use WordNet, a semantic network that organizes real world knowledge. In order to overcome the need of a huge number of classifiers to distinguish many different sound classes, we use a nearest-neighbor classifier with a database of isolated sounds unambiguously linked to WordNet concepts. A 30% concept prediction is achieved on a database of over 50,000 sounds and over 1600 concepts. © Springer Science + Business Media, Inc. 2005.
CITATION STYLE
Cano, P., Koppenberger, M., Le Groux, S., Ricard, J., Wack, N., & Herrera, P. (2005). Nearest-neighbor automatic sound annotation with a WordNet taxonomy. Journal of Intelligent Information Systems, 24(2–3), 99–111. https://doi.org/10.1007/s10844-005-0318-4
Mendeley helps you to discover research relevant for your work.