Abstract
Video surveillance improves public safety by preventing and sensing criminal activity, enhancing quick counteractions, and presenting evidence to investigators. This is effectively performed by firing a natural language query containing soft biometrics to retrieve a person from a video. State-of-the-art (SOTA) approaches focus on improving retrieval results; thus, the building blocks of any person retrieval system are not accorded due attention, putting novice researchers at a disadvantage. This study aims to provide a design methodology by showcasing the block-by-block construction of a person retrieval system using video and natural language. For each subsystem - natural language processing, person detection, attribute recognition, and ranking- we discuss the available design selections, provide empirical evidence, and discuss bottlenecks and solutions. We thereafter select and integrate the best choices to create an end-to-end system. We highlight the integration challenges and demonstrate that the proposed method achieves an average intersection over union and the true positive rate of ≥60%. This is the first study to provide practical guidance to researchers for fast prototyping of person retrieval with subsystem-level understanding and achieve SOTA performance.
Author supplied keywords
Cite
CITATION STYLE
Chaudhari, J. N., Galiyawala, H., Kuribayashi, M., Sharma, P., & Raval, M. S. (2023). Designing Practical End-to-End System for Soft Biometric-Based Person Retrieval From Surveillance Videos. IEEE Access, 11, 133640–133657. https://doi.org/10.1109/ACCESS.2023.3337108
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.