Spatial invariance to geometrically distorted data is of great importance in the vision and learning communities. Spatial transformer network (STN) can solve this problem in a computationally efficient manner. STN is a differentiable module which can be inserted in a standard CNN architecture to achieve spatial transformation of data. STN and its variants can handle global deformation well, but lack the ability to deal with local spatial variation. Hence how to achieve a better manner of spatial transformation within a neural network becomes a pressing matter of the moment. To address this issue, we design a module to estimate the difference between the ground truth and STN output. The difference is measured in the form of motion field. The motion field is utilized to refine the spatial transformation predicted by STN. Experimental results reveal that our method outperforms the state-of-the-art methods in the cluttered MNIST handwritten digits classification task and planar image alignment task.
CITATION STYLE
Shu, C., Chen, X., Yu, C., & Han, H. (2018). A refined spatial transformer network. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11303 LNCS, pp. 151–161). Springer Verlag. https://doi.org/10.1007/978-3-030-04182-3_14
Mendeley helps you to discover research relevant for your work.