Pedestrian attributes recognition is a very important problem in video surveillance and video forensics. Traditional methods assume the pedestrian attributes are independent and design handcraft features for each one. In this paper, we propose a joint hierarchical multi-task learning algorithm to learn the relationships among attributes for better recognizing the pedestrian attributes in still images using convolutional neural networks (CNN). We divide the attributes into local and global ones according to spatial and semantic relations, and then consider learning semantic attributes through a hierarchical multi-task CNN model where each CNN in the first layer will predict each group of such local attributes and CNN in the second layer will predict the global attributes. Our multi-task learning framework allows each CNN model to simultaneously share visual knowledge among different groups of attribute categories. Extensive experiments are conducted on two popular and challenging benchmarks in surveillance scenarios, namely, the PETA and RAP pedestrian attributes datasets. On both benchmarks, our framework achieves superior results over the state-of-the-art methods by 88.2 % on PETA and 83.25 % on RAP, respectively.
CITATION STYLE
Fang, W., Chen, J., Lu, T., & Hu, R. (2018). Pedestrian attributes recognition in surveillance scenarios with hierarchical multi-task cnn models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11165 LNCS, pp. 758–767). Springer Verlag. https://doi.org/10.1007/978-3-030-00767-6_70
Mendeley helps you to discover research relevant for your work.