The main focus of object detection, one of the most challenging problems in computer vision (CV), is to predict a set of bounding boxes and category labels for each object of interest in an image or in a point cloud. As such, object detection has a variety of exciting downstream applications such as self-driving cars, checkout-less shopping, smart cities, cancer detection, and more. This field has been revolutionized by deep learning over the past five years, where during this time, two-stage approaches to object detection have given way to simpler, more efficient, one-stage models. Mean average precision (mAP) on benchmark problems such as the COCO Object Detection dataset has improved almost 4X over the course of five years from 15% (Fast RCNN, a two-stage approach) to 55% (EfficientDet7x, a one-stage approach). This tutorial looks under the hood of state-of-the-art object detection systems, such as two-stage, one-stage, and also more recent approaches based upon transformers. It builds out some of their associated detection pipelines in a Jupyter Notebook using Python, OpenCV, PyTorch, Keras and Tensorflow. While the primary focus is on object detection in digital images from cameras and videos, this tutorial will also introduce object detection in 3D point clouds.
CITATION STYLE
Shanahan, J. G. (2020). Introduction to Computer Vision and Realtime Deep Learning-based Object Detection. In International Conference on Information and Knowledge Management, Proceedings (pp. 3515–3516). Association for Computing Machinery. https://doi.org/10.1145/3340531.3412177
Mendeley helps you to discover research relevant for your work.