In this study, we proposed a multitask network architecture for three attributes, landmark, head pose, and occlusion, from a face image. A 2-stacked hourglass with three task-specific heads is the proposed network architecture. We also designed three auxiliary components for the network. First is the feature pyramid fusion module, which plays a crucial role in facilitating contextual information from various receptive fields. Second is the interlevel occlusion-aware fusion module, which explicitly fuses intermediate occlusion prediction between subnetworks. The third is the gimbal-lock-free head pose head, which outputs a rotation matrix from a 6D rotation representation. We conducted an ablative study of these auxiliary components to determine their impacts on the network. Additionally, we introduced the landmark heatmap scaling approach to avoid falling local minima. We trained the proposed network with a 300W-LP dataset for landmark and head pose and a C-CM dataset for occlusion. Then, we fine-tuned the network using the 300W or WFLW dataset, instead of the 300W-LP dataset for the landmark task. This 2-stage training method contributes to enhancing the landmark detection accuracy and that of other tasks. In the experiments, we assessed the performance of the proposed network on eight test datasets using task-specific metrics. The results show that the proposed network achieved competitive performance across all the datasets and notably outperformed the state-of-the-art methods on AFLW2000 and Masked 300W datasets.
CITATION STYLE
Kim, Y., Roh, J. H., & Kim, S. (2023). Facial Landmark, Head Pose, and Occlusion Analysis Using Multitask Stacked Hourglass. IEEE Access, 11, 30970–30981. https://doi.org/10.1109/ACCESS.2023.3262247
Mendeley helps you to discover research relevant for your work.