It is difficult to understand a multimedia signal without being able to say something about its semantic content or its meaning. This chapter describes two algorithms that help bridge the semantic understanding gap that we have with multimedia. In both cases we represent the semantic content of a multimedia signal as a point in a high-dimensional space. In the first case, we represent the sentences of a video as a time-varying semantic signal. We look for discontinuities in this signal, of different sizes in a one-dimensional scale space, as an indication of a topic change. By sorting these changes, we can create a hierarchical segmentation of the video based on its semantic content. The same formalism can be used to think about color information and we consider the different media's temporal correlation properties. In the second half of this chapter we describe an approach that connects sounds to semantics. We call this semantic-audio retrieval; the goal is to find a (non-speech) audio signal that fits a query, or to describe a (non-speech) audio signal using the appropriate words. We make this connection by building and clustering high-dimensional vector descriptions of the audio signal and its corresponding semantic description. We then build models that link the two spaces, so that a query in one space can be mapped into a model that describes the probability of correspondence for points in the opposing space.
CITATION STYLE
Slaney, M., Ponceleon, D., & Kaufman, J. (2003). Understanding the Semantics of Media. In Video Mining (pp. 219–252). Springer US. https://doi.org/10.1007/978-1-4757-6928-9_8
Mendeley helps you to discover research relevant for your work.