Deep learning has shown remarkable progress in a wide range of problems. However, efficient training of such models requires large-scale datasets, and getting annotations for such datasets can be challenging and costly. In this work, we explore user-generated freely available labels from web videos for video understanding. We create a benchmark dataset consisting of around 2 million videos with associated user-generated annotations and other meta information. We utilize the collected dataset for action classification and demonstrate its usefulness with existing small-scale annotated datasets, UCF101 and HMDB51. We study different loss functions and two pretraining strategies, simple and self-supervised learning. We also show how a network pretrained on the proposed dataset can help against video corruption and label noise in downstream datasets. We present this as a benchmark dataset in noisy learning for video understanding. The dataset, code, and trained models are publicly available here for future research. A longer version of our paper is also available here.
CITATION STYLE
Sharma, M., Patra, R. A., Desai, H., Vyas, S., Rawat, Y., & Shah, R. R. (2021). NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3469877.3490580
Mendeley helps you to discover research relevant for your work.