Abstract
Depression is a mental health problem that affects human mood and the ability to function properly. Currently, the assessment is mainly based on subjective questionnaires and clinical opinions. This work aims to infer the depression level through analysing multimodal data, as a tool to support medical experts and patients in depression screening and diagnosis. We introduce a fusion method to estimate the Beck Depression Inventory II scores using multiple inputs: facial images, video-based blood volume pulse signals, and speech data. Each modality has its own regression model, based on the ResNet-50 architecture. Our approach leverages the synchrony between regression scores of these models to produce the fusion values. Specifically, we calculate the Pearson correlation coefficient and the dynamic time warping distance between sliding windows of the score sequences to find the optimal segments for fusion. We evaluate our method on the dataset of the fourth Audio-Visual Emotion Recognition Challenge (AVEC 2014). We achieve a Mean Absolute Error of 6.08 and a Root Mean Squared Error of 8.60, which are lower than those of each single-modality model.
Author supplied keywords
Cite
CITATION STYLE
Nguyen, L., Cañellas, M. L., Álvarez Casado, C., Wu, X., & Bordallo López, M. (2023). Synchrony-based Depression Score Aggregation from Single-Modality Models. In UbiComp/ISWC 2022 Adjunct - Proceedings of the 2022 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2022 ACM International Symposium on Wearable Computers (pp. 198–201). Association for Computing Machinery, Inc. https://doi.org/10.1145/3544793.3563410
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.