The egocentric video provides a unique view of event participants to show their attention, vision, and interaction with objects. In this paper, we introduce Ego-Deliver, a new large-scale egocentric video benchmark recorded by takeaway riders about their daily work. To the best of our knowledge, Ego-Deliver presents the first attempt in understanding activities from the takeaway delivery process while being one of the largest egocentric video action datasets to date. Our dataset provides a total of 5,360 videos with more than 139,000 multi-track annotations and 45 different attributes, which we believe is pivotal to future research in this area. We introduce the FS-Net architecture, a new anchor-free action detection approach handling extreme variations of action durations. We partition videos into fragments and build dynamic graphs over fragments, where multi-fragment context information is aggregated to boost fragment classification. A splicing and scoring module is applied to obtain final action proposals. Our experimental evaluation confirms that the proposed framework outperforms existing approaches on the proposed Ego-Deliver benchmark and is competitive on other popular benchmarks. In our current version, Ego-Deliver is used to make a comprehensive comparison between algorithms for activity detection. We also show its application to action recognition with promising results. The dataset, toolkits and baseline results will be made available at: https://egodeliver.github.io/EgoDeliver_Dataset/
CITATION STYLE
Qiu, H., He, P., Liu, S., Shao, W., Zhang, F., Wang, J., … Wang, F. (2021). Ego-Deliver: A Large-Scale Dataset for Egocentric Video Analysis. In MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia (pp. 1847–1855). Association for Computing Machinery, Inc. https://doi.org/10.1145/3474085.3475336
Mendeley helps you to discover research relevant for your work.