Abstract
With the rapid development of artificial intelligence technology such as machine learning and deep learning in remote sensing, data-driven models have become a new research paradigm for automatic information retrieval from remote sensing imagery, calling for higher requirements for the quantity, quality, and diversity of sample datasets. Before the era of deep learning, because classical machine learning methods (e.g., support vector machine and random forest) do not require huge numbers of samples for model training, the previously published sample datasets usually have a relatively small size (i.e., less than 100). In recent years, with the rapid development of technologies such as big data, parallel computing, and deep learning, many scholars and research institutions have issued a series of sample datasets, laying a solid foundation for a wide range of research and applications such as scene understanding, semantic segmentation, and object detection from remote sensing images. However, comprehensive review of the recently published sample datasets for remote sensing image analysis under the context of big data and deep learning remains lacking. Therefore, the objective of this study is to summarize and analyze these datasets to provide a valuable data reference for relevant researchers.On the basis of literature retrieval and analysis, this paper summarized a total of 124 widely used, open access, and influential remote sensing image sample datasets that were published between 2001 and 2020.We reviewed and summarized the development of recently published sample datasets for remote sensing imagery based on metadata analysis from the following aspects, such as data sources, application fields, keywords, and data size. Afterward, we analyzed these sample datasets from the perspective of spatial, spectral, and temporal resolutions. We listed the commonly used deep learning models (e.g., convolutional neural networks, recurrent neural networks, and generative adversarial networks) in the remote sensing field to show how these sample datasets could be used. We also divided the remote sensing image sample datasets into eight categories based on the following application fields: scene recognition, land cover/land use classification, thematic information extraction, change detection, ground-object detection, semantic segmentation, quantitative remote sensing, and other applications. The typical datasets and related research progress were carefully reviewed for each application field. In addition, because deep learning models are data-hungry, how to train a model with good generalization capability under limited labeled data has become a significant issue, especially for remote sensing applications given that obtaining sufficient labeled samples is time-consuming. To address this issue, we discussed several methods that could increase the model's generalization capability, including sample transfer between spatio-temporal domains, few-shot learning, and zero-shot learning, active learning, and semi-supervised learning for sample discovery, as well as sample generation through generative adversarial networks.By means of multi-dimensional analysis, we give a comprehensive overview of remote sensing image sample datasets. To the best of our knowledge, this paper is the first review of remote sensing image sample datasets for deep learning, potentially providing data reference for researchers in related fields.
Author supplied keywords
Cite
CITATION STYLE
Feng, Q., Chen, B., Li, G., Yao, X., Gao, B., & Zhang, L. (2022). A review for sample datasets of remote sensing imagery. National Remote Sensing Bulletin, 26(4), 589–605. https://doi.org/10.11834/jrs.20221162
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.