Listen, Look, and Find the One: Robust Person Search with Multimodality Index

Xiao Wang; Wu Liu; Jun Chen; Xiaobo Wang; Chenggang Yan; Tao Mei

Journal ArticleOPEN ACCESS

Listen, Look, and Find the One: Robust Person Search with Multimodality Index

ACM Transactions on Multimedia Computing, Communications and Applications (2020) 16(2)

DOI: 10.1145/3380549

7Citations

10Readers

Get full text

Abstract

Person search with one portrait, which attempts to search the targets in arbitrary scenes using one portrait image at a time, is an essential yet unexplored problem in the multimedia field. Existing approaches, which predominantly depend on the visual information of persons, cannot solve problems when there are variations in the person's appearance caused by complex environments and changes in pose, makeup, and clothing. In contrast to existing methods, in this article, we propose an associative multimodality index for person search with face, body, and voice information. In the offline stage, an associative network is proposed to learn the relationships among face, body, and voice information. It can adaptively estimate the weights of each embedding to construct an appropriate representation. The multimodality index can be built by using these representations, which exploit the face and voice as long-term keys and the body appearance as a short-term connection. In the online stage, through the multimodality association in the index, we can retrieve all targets depending only on the facial features of the query portrait. Furthermore, to evaluate our multimodality search framework and facilitate related research, we construct the Cast Search in Movies with Voice (CSM-V) dataset, a large-scale benchmark that contains 127K annotated voices corresponding to tracklets from 192 movies. According to extensive experiments on the CSM-V dataset, the proposed multimodality person search framework outperforms the state-of-the-art methods.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Wang, X., Liu, W., Chen, J., Wang, X., Yan, C., & Mei, T. (2020). Listen, Look, and Find the One: Robust Person Search with Multimodality Index. ACM Transactions on Multimedia Computing, Communications and Applications, 16(2). https://doi.org/10.1145/3380549

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 2

100%

Readers' Discipline

Computer Science 5

100%

Listen, Look, and Find the One: Robust Person Search with Multimodality Index

Abstract

Author supplied keywords

References Powered by Scopus

Deep residual learning for image recognition

ImageNet Large Scale Visual Recognition Challenge

Control of goal-directed and stimulus-driven attention in the brain

Cited by Powered by Scopus

Age-Invariant Face Recognition by Multi-Feature Fusionand Decomposition with Self-attention

Fine-Grained Fragment Diffusion for Cross Domain Crowd Counting

Road Intersection Recognition via Combining Classification Model and Clustering Algorithm Based on GPS Data

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline