Abstract
Large-scale “foundation models” have gained traction as a way to leverage the vast amounts of unlabeled remote sensing data collected every day. However, due to the multiplicity of Earth Observation (EO) satellites, these models should learn “sensor agnostic” representations, that generalize across sensor characteristics with minimal fine-tuning. This is complicated by data availability, as low-resolution imagery, such as Sentinel-2 and Landsat-8 data, are available in large amounts, while very high-resolution aerial or satellite data is less common. To better leverage multisensor data, we introduce cross-sensor self-supervised training and alignment for remote sensing (X-STARS). We design a self-supervised training loss, the multi-sensor alignment dense loss, to align representations across sensors, even with vastly different resolutions, through a contrastive patch-wise mechanism. Our XSTARS can be applied to train models from scratch, or to adapt large models pretrained on e.g. low-resolution EO data to new highresolution sensors, in a continual pretraining framework. We collect and release multi-sensors cities-France, a new multisensor dataset, on which we train our X-STARS models, then evaluated on seven downstream classification and segmentation tasks. We demonstrate that X-STARS outperforms the state-of-the-art with less data across various conditions of data availability and resolutions.
Author supplied keywords
Cite
CITATION STYLE
Marsocci, V., & Audebert, N. (2025). Cross-Sensor Self-Supervised Training and Alignment for Remote Sensing. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 18, 12278–12289. https://doi.org/10.1109/JSTARS.2025.3566457
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.