Abstract
This paper describes the Trento Universal Human Object Interaction dataset, TUHOI, which is dedicated to human object interactions in images.1 Recognizing human actions is an important yet challenging task. Most available datasets in this field are limited in numbers of actions and objects. A large dataset with various actions and human object interactions is needed for training and evaluating complicated and robust human action recognition systems, especially systems that combine knowledge learned from language and vision. We introduce an image collection with more than two thousand actions which have been annotated through crowdsourcing. We review publicly available datasets, describe the annotation process of our image collection and some statistics of this dataset. Finally, experimental results on the dataset including human action recognition based on objects and an analysis of the relation between human-object positions in images and prepositions in language are presented.
Cite
CITATION STYLE
Le, D. T., Uijlings, J., & Bernardi, R. (2014). TUHOI: Trento Universal Human Object Interaction Dataset. In V and L Net 2014 - 3rd Annual Meeting of the EPSRC Network on Vision and Language and 1st Technical Meeting of the European Network on Integrating Vision and Language, A Workshop of the 25th International Conference on Computational Linguistics, COLING 2014 - Proceedings (pp. 17–24). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-5403
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.