Grassmann Pooling as Compact Homogeneous Bilinear Pooling for Fine-Grained Visual Classification

23Citations
Citations of this article
96Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Designing discriminative and invariant features is the key to visual recognition. Recently, the bilinear pooled feature matrix of Convolutional Neural Network (CNN) has shown to achieve state-of-the-art performance on a range of fine-grained visual recognition tasks. The bilinear feature matrix collects second-order statistics and is closely related to the covariance matrix descriptor. However, the bilinear feature could suffer from the visual burstiness phenomenon similar to other visual representations such as VLAD and Fisher Vector. The reason is that the bilinear feature matrix is sensitive to the magnitudes and correlations of local CNN feature elements which can be measured by its singular values. On the other hand, the singular vectors are more invariant and reasonable to be adopted as the feature representation. Motivated by this point, we advocate an alternative pooling method which transforms the CNN feature matrix to an orthonormal matrix consists of its principal singular vectors. Geometrically, such orthonormal matrix lies on the Grassmann manifold, a Riemannian manifold whose points represent subspaces of the Euclidean space. Similarity measurement of images reduces to comparing the principal angles between these “homogeneous” subspaces and thus is independent of the magnitudes and correlations of local CNN activations. In particular, we demonstrate that the projection distance on the Grassmann manifold deduces a bilinear feature mapping without explicitly computing the bilinear feature matrix, which enables a very compact feature and classifier representation. Experimental results show that our method achieves an excellent balance of model complexity and accuracy on a variety of fine-grained image classification datasets.

Cite

CITATION STYLE

APA

Wei, X., Zhang, Y., Gong, Y., Zhang, J., & Zheng, N. (2018). Grassmann Pooling as Compact Homogeneous Bilinear Pooling for Fine-Grained Visual Classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11207 LNCS, pp. 365–380). Springer Verlag. https://doi.org/10.1007/978-3-030-01219-9_22

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free