Facial expression recognition (FER) is an extremely challenging task under unconstrained conditions. Especially, variant head poses degrade the performance dramatically due to the large variations in appearance of facial expressions. To address this problem, we propose a local attention network (LAN), which adaptively captures the important facial regions according to pose variations. The LAN emphasizes on more attentive regions while suppressing the regions not differentiated between classes. To find out attentive regions, we propose a simple yet efficient coarse-level attention guidance map annotation method in an unsupervised manner. The guidance map includes attention values for regions based on whether features are deformed by facial poses. Further, the attentive regional features obtained by our LAN and original global features are combined for pose-invariant FER. We validate our method on a controlled multiview dataset, KDEF, three popular in-the-wild datasets, RAF-DB, FERPlus, and AffectNet, and their subsets that contain images under pose variation conditions. Extensive experiments show that our LAN largely improves the performance of FER under pose variations. Our method also performs favorably against the previous methods.
CITATION STYLE
Cho, S., & Lee, J. (2022). Learning Local Attention With Guidance Map for Pose Robust Facial Expression Recognition. IEEE Access, 10, 85929–85940. https://doi.org/10.1109/ACCESS.2022.3198658
Mendeley helps you to discover research relevant for your work.