Rich feature hierarchies for accurate object detection and semantic segmentation

Ross Girshick; Jeff Donahue; Trevor Darrell; Jitendra Malik

Conference Proceedings

Rich feature hierarchies for accurate object detection and semantic segmentation

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2014) 580-587

DOI: 10.1109/CVPR.2014.81

30.7kCitations

17.6kReaders

Get full text

Abstract

Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 - achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also present experiments that provide insight into what the network learns, revealing a rich hierarchy of image features. Source code for the complete system is available at http://www.cs.berkeley.edu/~rbg/rcnn.

Cite

CITATION STYLE

APA

Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 580–587). IEEE Computer Society. https://doi.org/10.1109/CVPR.2014.81

Rich feature hierarchies for accurate object detection and semantic segmentation

Abstract

Cite

Register to see more suggestions