One perceptron to rule them all: Language, vision, audio and speech

Xavier Giro-I-Nieto

Conference ProceedingsOPEN ACCESS

One perceptron to rule them all: Language, vision, audio and speech

Giro-I-Nieto X

ICMR 2020 - Proceedings of the 2020 International Conference on Multimedia Retrieval (2020) 7-8

DOI: 10.1145/3372278.3390740

0Citations

6Readers

Get full text

Abstract

Deep neural networks have boosted the convergence of multimedia data analytics in a unified framework shared by practitioners in natural language, vision and speech. Image captioning, lip reading or video sonorization are some of the first applications of a new and exciting field of research exploiting the generalization properties of deep neural representation. This tutorial will firstly review the basic neural architectures to encode and decode vision, text and audio, to later review the those models that have successfully translated information across modalities.

Author supplied keywords

Cite

CITATION STYLE

APA

Giro-I-Nieto, X. (2020). One perceptron to rule them all: Language, vision, audio and speech. In ICMR 2020 - Proceedings of the 2020 International Conference on Multimedia Retrieval (pp. 7–8). Association for Computing Machinery. https://doi.org/10.1145/3372278.3390740

One perceptron to rule them all: Language, vision, audio and speech

Abstract

Author supplied keywords

Cite

Register to see more suggestions