Activities and events in our lives are structural, be it a vacation, a camping trip, or a wedding. While individual details vary, there are characteristic patterns that are specific to each of these scenarios. For example, a wedding typically consists of a sequence of events such as walking down the aisle, exchanging vows, and dancing. In this paper, we present a data-driven approach to learning event knowledge from a large collection of photo albums. We formulate the task as constrained optimization to induce the prototypical temporal structure of an event, integrating both visual and textual cues. Comprehensive evaluation demonstrates that it is possible to learn multimodal knowledge of event structure from noisy web content.
CITATION STYLE
Bosselut, A., Chen, J., Warren, D., Hajishirzi, H., & Choi, Y. (2016). Learning prototypical event structure from photo albums. In 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers (Vol. 3, pp. 1769–1779). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p16-1167
Mendeley helps you to discover research relevant for your work.