Unsupervised learning of deep feature representation for clustering egocentric actions

34Citations
Citations of this article
59Readers
Mendeley users who have this article in their library.

Abstract

Popularity of wearable cameras in life logging, law enforcement, assistive vision and other similar applications is leading to explosion in generation of egocentric video content. First person action recognition is an important aspect of automatic analysis of such videos. Annotating such videos is hard, not only because of obvious scalability constraints, but also because of privacy issues often associated with egocentric videos. This motivates the use of unsupervised methods for egocentric video analysis. In this work, we propose a robust and generic unsupervised approach for first person action clustering. Unlike the contemporary approaches, our technique is neither limited to any particular class of action nor requires priors such as pre-training, fine-tuning, etc. We learn time sequenced visual and flow features from an array of weak feature extractors based on convolutional and LSTM autoencoder networks. We demonstrate that clustering of such features leads to the discovery of semantically meaningful actions present in the video. We validate our approach on four disparate public egocentric actions datasets amounting to approximately 50 hours of videos. We show that our approach surpasses the supervised state of the art accuracies without using the action labels.

References Powered by Scopus

Action recognition with improved trajectories

3034Citations
N/AReaders
Get full text

Action recognition by dense trajectories

1941Citations
N/AReaders
Get full text

Unsupervised learning of human action categories using spatial-temporal words

1218Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors

101Citations
N/AReaders
Get full text

Watching a small portion could be as good as watching all: Towards efficient video classification

79Citations
N/AReaders
Get full text

A perceptual prediction framework for self supervised event segmentation

69Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Bhatnagar, B. L., Singh, S., Arora, C., & Jawahar, C. V. (2017). Unsupervised learning of deep feature representation for clustering egocentric actions. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 0, pp. 1447–1453). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2017/200

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 36

82%

Researcher 4

9%

Professor / Associate Prof. 3

7%

Lecturer / Post doc 1

2%

Readers' Discipline

Tooltip

Computer Science 31

74%

Engineering 7

17%

Mathematics 2

5%

Business, Management and Accounting 2

5%

Save time finding and organizing research with Mendeley

Sign up for free