Abstract
Significant progress has been witnessed for the descriptor and detector of local features, but there still exist several challenging and intractable limitations, such as insufficient localization accuracy and non-discriminative description, especially in repetitive- or blank-texture regions, which haven't be well addressed. The coarse feature representation and limited receptive field are considered as the main issues for these limitations. To address these issues, we propose a novel Soft Point-Wise Transformer for Descriptor and Detector, simultaneously mining long-range intrinsic and cross-scale dependencies of local features. Furthermore, our model leverages the distinct transformers based on the soft point-wise attention, substantially decreasing the memory and computation complexity, especially for high-resolution feature maps. In addition, multi-level decoder is constructed to guarantee the high detection accuracy and discriminative description. Extensive experiments demonstrate that our model outperforms the existing state-of-the-art methods on the image matching and visual localization benchmarks.
Cite
CITATION STYLE
Wang, Z., Li, X., & Li, Z. (2021). Local Representation is Not Enough: Soft Point-wise Transformer for Descriptor and Detector of Local Features. In IJCAI International Joint Conference on Artificial Intelligence (pp. 1150–1156). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2021/159
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.