Many computer vision tasks are template-based learning tasks in which multiple instances of a specific concept (e.g. multiple images of a subject’s face) are available at once to the learning algorithm. The template structure of the input data provides an opportunity for generating a robust and discriminative unified template-level representation that effectively exploits the inherent diversity of feature-level information across instances within a template. In contrast to other feature aggregation methods, we propose a new technique to dynamically predict weights that consider factors such as noise and redundancy in assessing the importance of image-level features and use those weights to appropriately aggregate the features into a single template-level representation. We present extensive experimental results on the MNIST, CIFAR10, UCF101, IJB-A, IJB-B, and Janus CS4 datasets to show that the new technique outperforms statistical feature pooling methods as well as other neural-network-based aggregation mechanisms on a broad set of tasks.
CITATION STYLE
Li, Z., Wu, Y., Abd-Almageed, W., & Natarajan, P. (2019). Weighted Feature Pooling Network in Template-Based Recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11365 LNCS, pp. 436–451). Springer Verlag. https://doi.org/10.1007/978-3-030-20873-8_28
Mendeley helps you to discover research relevant for your work.