Local Representation is Not Enough: Soft Point-wise Transformer for Descriptor and Detector of Local Features

14Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Significant progress has been witnessed for the descriptor and detector of local features, but there still exist several challenging and intractable limitations, such as insufficient localization accuracy and non-discriminative description, especially in repetitive- or blank-texture regions, which haven't be well addressed. The coarse feature representation and limited receptive field are considered as the main issues for these limitations. To address these issues, we propose a novel Soft Point-Wise Transformer for Descriptor and Detector, simultaneously mining long-range intrinsic and cross-scale dependencies of local features. Furthermore, our model leverages the distinct transformers based on the soft point-wise attention, substantially decreasing the memory and computation complexity, especially for high-resolution feature maps. In addition, multi-level decoder is constructed to guarantee the high detection accuracy and discriminative description. Extensive experiments demonstrate that our model outperforms the existing state-of-the-art methods on the image matching and visual localization benchmarks.

Cite

CITATION STYLE

APA

Wang, Z., Li, X., & Li, Z. (2021). Local Representation is Not Enough: Soft Point-wise Transformer for Descriptor and Detector of Local Features. In IJCAI International Joint Conference on Artificial Intelligence (pp. 1150–1156). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2021/159

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free