Stress has become an increasingly serious problem in the current society, threatening mankind’s well-beings. With the ubiquitous deployment of video cameras in surroundings, detecting stress based on the contact-free camera sensors becomes a cost-effective and mass-reaching way without interference of artificial traits and factors. In this study, we leverage users’ facial expressions and action motions in the video and present a two-leveled stress detection network (TSDNet). TSDNet firstly learns face-and action-level representations separately, and then fuses the results through a stream weighted integrator with local and global attention for stress identification. To evaluate the performance of TSDNet, we constructed a video dataset containing 2092 labeled video clips, and the experimental results on the built dataset show that: (1) TSDNet outperformed the hand-crafted feature engineering approaches with detection accuracy 85.42% and F1-Score 85.28%, demonstrating the feasibility and effectiveness of using deep learning to analyze one’s face and action motions; and (2) considering both facial expressions and action motions could improve detection accuracy and F1-Score of that considering only face or action method by over 7%.
CITATION STYLE
Zhang, H., Feng, L., Li, N., Jin, Z., & Cao, L. (2020). Video-based stress detection through deep learning. Sensors (Switzerland), 20(19), 1–17. https://doi.org/10.3390/s20195552
Mendeley helps you to discover research relevant for your work.