Prior convolution-based road crack detectors typically learn more abstract visual representation with increasing receptive field via an encoder-decoder architecture. Despite the promising accuracy, progressive spatial resolution reduction causes semantic feature blurring, leading to coarse and incontiguous distress detection. To these ends, an alternative sequence-to-sequence perspective with a transformer network termed TransCrack is introduced for road crack detection. Specifically, an image is decomposed into a grid of fixed-size crack patches, which is flattened with position embedding into a sequence. We further propose a pure transformer-based encoder with multi-head reduced self-attention modules and feed-forward networks for explicitly modelling long-range dependencies from the sequential input in a global receptive field. More importantly, a simple decoder with cross-layer aggregation architecture is developed to incorporate global with local attentions across different regions for detailed feature recovery and pixel-wise crack mask prediction. Empirical studies are conducted on three publicly available damage detection benchmarks. The proposed TransCrack achieves a state-of-the-art performance over all counterparts by a substantialmargin, and qualitative results further demonstrate its superiority in contiguous crack recognition and fine-grained profile extraction. This article is part of the theme issue 'Artificial intelligence in failure analysis of transportation infrastructure and materials'.
CITATION STYLE
Lin, C., Tian, D., Duan, X., & Zhou, J. (2023). TransCrack: revisiting fine-grained road crack detection with a transformer design. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 381(2254). https://doi.org/10.1098/rsta.2022.0172
Mendeley helps you to discover research relevant for your work.