A survey of current datasets for vision and language research

Francis Ferraro; Nasrin Mostafazadeh; Ting Hao Huang; Lucy Vanderwende; Jacob Devlin; Michel Galley; Margaret Mitchell

Conference ProceedingsOPEN ACCESS

A survey of current datasets for vision and language research

Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (2015) 207-213

DOI: 10.18653/v1/d15-1021

48Citations

211Readers

Abstract

Integrating vision and language has long been a dream in work on artificial intelligence (AI). In the past two years, we have witnessed an explosion of work that brings together vision and language from images to videos and beyond. The available corpora have played a crucial role in advancing this area of research. In this paper, we propose a set of quality metrics for evaluating and analyzing the vision & language datasets and categorize them accordingly. Our analyses show that the most recent datasets have been using more complex language and more abstract concepts, however, there are different strengths and weaknesses in each.

Cite

CITATION STYLE

APA

Ferraro, F., Mostafazadeh, N., Huang, T. H., Vanderwende, L., Devlin, J., Galley, M., & Mitchell, M. (2015). A survey of current datasets for vision and language research. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 207–213). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1021

A survey of current datasets for vision and language research

Abstract

Cite

Register to see more suggestions