Summarizing first-person videos from third persons’ points of views

5Citations
Citations of this article
102Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Video highlight or summarization is among interesting topics in computer vision, which benefits a variety of applications like viewing, searching, or storage. However, most existing studies rely on training data of third-person videos, which cannot easily generalize to highlight the first-person ones. With the goal of deriving an effective model to summarize first-person videos, we propose a novel deep neural network architecture for describing and discriminating vital spatiotemporal information across videos with different points of view. Our proposed model is realized in a semi-supervised setting, in which fully annotated third-person videos, unlabeled first-person videos, and a small number of annotated first-person ones are presented during training. In our experiments, qualitative and quantitative evaluations on both benchmarks and our collected first-person video datasets are presented.

Cite

CITATION STYLE

APA

Ho, H. I., Chiu, W. C., & Wang, Y. C. F. (2018). Summarizing first-person videos from third persons’ points of views. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11219 LNCS, pp. 72–89). Springer Verlag. https://doi.org/10.1007/978-3-030-01267-0_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free