In recent years,object detection has attracted increasing attention because of the rapid development of computer vision and artificial intelligence technology. Early traditional object detection methods,such as histogram of oriented gradient(HOG)and deformable parts model(DPM)usually adopt three steps:region selection,manual feature extraction,and classification regression. However,manual feature extraction has great limitations for small object detection. The object detection algorithm based on the convolutional neural network can be divided into two-stage and one-stage detection algorithms. Two-stage detection algorithms,such as faster region with convolutional neural network(Faster RCNN)and cascade region with convolutional neural network(Cascade RCNN),select candidate regions through the region proposal network. Then,they classify and regress these regions to obtain the detection results. However,the problem of low accuracy still exists in small object detection. One-stage detection algorithms,such as single shot MultiBox detector(SSD)and you only look once(YOLO),can directly locate the object and output the category detection information of the object,thereby improving the speed of object detection to a certain extent. However,small object detection has always been a huge challenge in the field of object detection because of the small proportion of small object pixels,little semantic information,and small objects that are easily disturbed by complex scenes. In particular,the challenges in object detection are as follows:First,the characteristics of small objects are few. Given the small scale of small objects and the small coverage area in data images,extracting favorable semantic feature information in network training is difficult. Second,small object detection is susceptible to interference. Most of the small objects have low resolution,blurred images,and little visual information. Thus,they are easily disturbed during difficult feature extraction. Thus,the detection model cannot easily locate and identify small objects accurately. Moreover,many false detections and missed detections exist. Third,a shortage of small object datasets exists. At present,most of the mainstream object datasets,such as PASCAL VOC and MS-COCO,are aimed at normal-scale objects. In particular,the proportion of small-scale objects is insufficient,and the distribution is uneven. However,some datasets mentioned in this study that can be used for small object detection are all aimed at specific scenes or tasks. These datasets include DOTA remote sensing object detection dataset,face detection dataset and benchmark,which are not universal for small object detection. Fourth,small objects are easy to gather and block. A serious occlusion problem occurs when small objects gather. After many downsampling and pooling operations,quite a lot of feature information is lost,resulting in some detection difficulties. At present,visual small object detection is increasingly important in all fields of life. Aiming at the problems in small object detection,this study combs the research status and achievements of small object detection at home and abroad to promote the development of small object detection further,improve the speed and accuracy of small object detection,and optimize its algorithm model. The methods of small object detection are analyzed and summarized from the aspects of data enhancement,super resolution,multiscale feature fusion,contextual semantic information,anchor frame mechanism,attention,and specific detection scenarios. Data enhancement is the method proposed for solving the problems of a few general small object datasets,a small number of small objects in public datasets,and uneven distribution of small objects in images. The earliest data enhancement strategy is to increase the number of object training and improve the performance of object detection by deforming,rotating,scaling,cutting,and translating object instances. Then,other effective data augmentation methods emerged,which included oversampling the images containing small objects in the experiment,scaling and rotating the small objects,and copying the objects to any new position in order to augment the data. Data enhancement helps improve the robustness of a model to a certain extent. Moreover,it solves the problems of unobvious visual features of small objects and less object information. It also achieves good results in the final detection performance. However,the improper design of data enhancement strategy in practical applications may lead to new noise,impairing the performance of feature extraction. This scenario also brings some challenges to the design of the algorithm. The small object detection method based on multiscale fusion needs to make full use of the detailed information in the image because the characteristic information of small-scale objects is little. In the existing convolutional neural network(CNN)model of general object detection,multiscale detection can help the model to obtain accurate positioning information and discriminating feature information by using a low-level feature layer. This scenario is conducive to the detection and recognition of small-scale objects. First,a feature pyramid network(FPN)with strong semantic features at all scales is introduced. Then,an fpn-based path aggregation network(PANet),which not only achieved good results in case segmentation but also improved the detection of small objects. In feature fusion,the residual feature enhancement method extracts the context information with a constant ratio to reduce the information loss of the highest pyramid feature map. At present,many methods are based on multiscale feature fusion,which uses the low-level high-resolution and high-level strong feature semantic information of the network to improve the accuracy of small objects. In small object detection,the target’s feature expression ability is weak. Thus,the network structure must be deepened to learn considerable feature information. Introducing an attention mechanism can often make the network model pay considerable attention to the channels and areas related to the task. In the object detection network,the shallow feature map lacks the contextual semantic information of small objects. By incorporating attention mechanisms into the SSD model,irrelevant information in feature fusion is suppressed,leading to an improvement in the detection accuracy of small objects. In general,the attention mechanism can reasonably allocate the used resources,quickly find the region of interest,and ignore disturbing information. However,the improper design in use increases the cost of network calculation and affects the extraction of object features by the model. Finally,the future research direction of small object detection is prospected. Visual small object detection is becoming increasingly important in all fields of life,and it will develop in other directions in the future.
CITATION STYLE
Pan, X., Jia, N., Mu, Y., & Gao, X. (2023). Survey of small object detection. Journal of Image and Graphics, 28(9), 2587–2615. https://doi.org/10.11834/jig.220455
Mendeley helps you to discover research relevant for your work.