Accurate head pose estimation from 2D image data is an essential component of applications such as driver monitoring systems, virtual reality technology, and human-computer interaction. It enables a better determination of user engagement and attentiveness. The most accurate head pose estimators are based on Deep Neural Networks that are trained with the supervised approach and rely primarily on the accuracy of training data. The acquisition of real head pose data with a wide variation of yaw, pitch and roll is a challenging task. Publicly available head pose datasets have limitations with respect to size, resolution, annotation accuracy and diversity. In this work, a methodology is proposed to generate pixel-perfect synthetic 2D headshot images rendered from high-quality 3D synthetic facial models with accurate head pose annotations. A diverse range of variations in age, race, and gender are also provided. The resulting dataset includes more than 300k pairs of RGB images with corresponding head pose annotations. A wide range of variations in pose, illumination and background are included. The dataset is evaluated by training a state-of-the-art head pose estimation model and testing against the popular evaluation-dataset Biwi. The results show that training with purely synthetic data generated using the proposed methodology achieves close to state-of-the-art results on head pose estimation which are originally trained on real human facial datasets. As there is a domain gap between the synthetic images and real-world images in the feature space, initial experimental results fall short of the current state-of-the-art. To reduce the domain gap, a semisupervised visual domain adaptation approach is proposed, which simultaneously trains with the labelled synthetic data and the unlabeled real data. When domain adaptation is applied, a significant improvement in model performance is achieved. Additionally, by applying a data fusion-based transfer learning approach, better results are achieved than previously published work on this topic.
CITATION STYLE
Basak, S., Corcoran, P., Khan, F., McDonnell, R., & Schukat, M. (2021). Learning 3D head pose from synthetic data: A semi-supervised approach. IEEE Access, 9, 37557–37573. https://doi.org/10.1109/ACCESS.2021.3063884
Mendeley helps you to discover research relevant for your work.