There has been continuous growth in the volume and ubiquity of video material. It has become essential to define video semantics in order to aid the searchability and retrieval of this data. Although the method of annotating this data with keywords is relatively well researched, the quality can be improved through describing videos with natural language. We are exploring approaches to generating natural language descriptions of interrelations between human activities in a video stream. This paper focuses on creation of a dataset that can be used for development and evaluation. To this end a corpus of video clips, manually selected from the Hollywood2 dataset, and their natural language descriptions has been generated. Analysis of the hand annotation presents insights into human interests and thoughts. Such resource can be used to evaluate automatic natural language generation systems for video.
CITATION STYLE
Al Harbi, N., & Gotoh, Y. (2016). Natural language descriptions of human activities scenes: Corpus generation and analysis. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 39–47). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-3205
Mendeley helps you to discover research relevant for your work.