Video Object Detection with Two-Path Convolutional LSTM Pyramid

Chen Zhang; Joohee Kim

Journal ArticleOPEN ACCESS

Video Object Detection with Two-Path Convolutional LSTM Pyramid

IEEE Access (2020) 8 151681-151691

DOI: 10.1109/ACCESS.2020.3017411

16Citations

14Readers

Abstract

One of the major challenges in video object detection is drastic scale changes of objects due to camera motion. In this paper, we propose a two-path Convolutional Long Short-Term Memory (convLSTM) pyramid network designed to extract and convey multi-scale temporal contextual information in order to handle object scale changes efficiently. The proposed two-path convLSTM pyramid consists of a stack of multi-input convLSTM modules. It is updated in top-down and bottom-up pathways so that the temporal contextual information for small-to-large and large-to-small scale changes is exploited. The proposed multi-input convLSTM module uses two input feature maps of different resolutions to store and exchange temporal contextual information of different scales between neighboring convLSTM modules. The outputs of the proposed convLSTM pyramid network constitute a feature pyramid where each feature map contains multi-scale temporal contextual information from earlier frames. The proposed convLSTM pyramid can be combined with various still-image object detectors to improve the performance of video object detection. Experimental results on ImageNet VID dataset show that the proposed method achieves state-of-the-art performance and can handle scale changes efficiently in video object detection.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhang, C., & Kim, J. (2020). Video Object Detection with Two-Path Convolutional LSTM Pyramid. IEEE Access, 8, 151681–151691. https://doi.org/10.1109/ACCESS.2020.3017411

Video Object Detection with Two-Path Convolutional LSTM Pyramid

Abstract

Author supplied keywords

Cite

Register to see more suggestions