ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

Zhe Wang; Zhiyuan Fang; Jun Wang; Yezhou Yang

Conference Proceedings

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12357 LNCS 402-420

DOI: 10.1007/978-3-030-58610-2_24

79Citations

85Readers

Get full text

Abstract

Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches given textual descriptions. While most of the current methods treat the task as a holistic visual and textual feature matching one, we approach it from an attribute-aligning perspective that allows grounding specific attribute phrases to the corresponding visual regions. We achieve success as well as a performance boost by a robust feature learning that the referred identity can be accurately bundled by multiple attribute cues. To be concrete, our Visual-Textual Attribute Alignment model (dubbed as ViTAA) learns to disentangle the feature space of a person into sub-spaces corresponding to attributes using a light auxiliary attribute segmentation layer. It then aligns these visual features with the textual attributes parsed from the sentences via a novel contrastive learning loss. We validate our ViTAA framework through extensive experiments on tasks of person search by natural language and by attribute-phrase queries, on which our system achieves state-of-the-art performances. Codes and models are available at https://github.com/Jarr0d/ViTAA.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, Z., Fang, Z., Wang, J., & Yang, Y. (2020). ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12357 LNCS, pp. 402–420). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-58610-2_24

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

Abstract

Author supplied keywords

Cite

Register to see more suggestions