Comparing data sources and architectures for deep visual representation learning in semantics

Douwe Kiela; Anita L. Vero; Stephen Clark

Conference ProceedingsOPEN ACCESS

Comparing data sources and architectures for deep visual representation learning in semantics

EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings (2016) 447-456

DOI: 10.18653/v1/d16-1043

17Citations

88Readers

Abstract

Multi-modal distributional models learn grounded representations for improved performance in semantics. Deep visual representations, learned using convolutional neural networks, have been shown to achieve particularly high performance. In this study, we systematically compare deep visual representation learning techniques, experimenting with three well-known network architectures. In addition, we explore the various data sources that can be used for retrieving relevant images, showing that images from search engines perform as well as, or better than, those from manually crafted resources such as ImageNet. Furthermore, we explore the optimal number of images and the multi-lingual applicability of multi-modal semantics. We hope that these findings can serve as a guide for future research in the field.

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Kiela, D., Vero, A. L., & Clark, S. (2016). Comparing data sources and architectures for deep visual representation learning in semantics. In EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 447–456). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d16-1043

Readers' Seniority

PhD / Post grad / Masters / Doc 27

68%

Researcher 6

15%

Professor / Associate Prof. 5

13%

Lecturer / Post doc 2

Readers' Discipline

Computer Science 38

79%

Linguistics 6

13%

Social Sciences 2

Engineering 2

Comparing data sources and architectures for deep visual representation learning in semantics

Abstract

Cited by Powered by Scopus

Illustrative language understanding: Large-scale visual grounding with image search

Speaking, seeing, understanding: Correlating semantic models with conceptual representation in the brain

Bridging languages through images with deep partial canonical correlation analysis

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline