Improving Closed and Open-Vocabulary Attribute Prediction Using Transformers

Khoi Pham; Kushal Kafle; Zhe Lin; Zhihong Ding; Scott Cohen; Quan Tran; Abhinav Shrivastava

Conference Proceedings

Improving Closed and Open-Vocabulary Attribute Prediction Using Transformers

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13685 LNCS 201-219

DOI: 10.1007/978-3-031-19806-9_12

5Citations

16Readers

Get full text

Abstract

We study recognizing attributes for objects in visual scenes. We consider attributes to be any phrases that describe an object’s physical and semantic properties, and its relationships with other objects. Existing work studies attribute prediction in a closed setting with a fixed set of attributes, and implements a model that uses limited context. We propose TAP, a new Transformer-based model that can utilize context and predict attributes for multiple objects in a scene in a single forward pass, and a training scheme that allows this model to learn attribute prediction from image-text datasets. Experiments on the large closed attribute benchmark VAW show that TAP outperforms the SOTA by 5.1% mAP. In addition, by utilizing pretrained text embeddings, we extend our model to OpenTAP which can recognize novel attributes not seen during training. In a large-scale setting, we further show that OpenTAP can predict a large number of seen and unseen attributes that outperforms large-scale vision-text model CLIP by a decisive margin. The project page is available at https://vkhoi.github.io/TAP.

Author supplied keywords

Cite

CITATION STYLE

APA

Pham, K., Kafle, K., Lin, Z., Ding, Z., Cohen, S., Tran, Q., & Shrivastava, A. (2022). Improving Closed and Open-Vocabulary Attribute Prediction Using Transformers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13685 LNCS, pp. 201–219). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-19806-9_12

Improving Closed and Open-Vocabulary Attribute Prediction Using Transformers

Abstract

Author supplied keywords

Cite

Register to see more suggestions