Fusion of text and audio semantic representations through CCA

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Humans are natural multimedia processing machines. Multimedia is a domain of multi-modalities including audio, text and images. A central aspect of multimedia processing is the coherent integration of media from different modalities as a single identity. Multimodal information fusion architectures become a necessity when not all information channels are available at all times. In this paper, we introduce a multimodal fusion of audio signals and lyrics in a shared semantic space through canonical correlation analysis. We propose an audio retrieval system based on extended semantic analysis of audio signals. We will combine this model with a tf-idf representation of lyrics to achieve a multimodal retrieval system. We use canonical correlation analysis and supervised learning methods as a basis for relating audio and lyrics information. Our experimental evaluation of the proposed method indicated that the proposed model outperforms the prior approaches based on simple canonical correlation methods. Finally, the efficiency of the proposed method allows for dealing with large music and lyrics collections enabling users to explore relevant lyrics information for music datasets.

Cite

CITATION STYLE

APA

Aryafar, K., & Shokoufandeh, A. (2015). Fusion of text and audio semantic representations through CCA. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8869, pp. 66–73). Springer Verlag. https://doi.org/10.1007/978-3-319-14899-1_7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free