Multi-modal adaptive fusion transformer network for the estimation of depression level

Hao Sun; Jiaqing Liu; Shurong Chai; Zhaolin Qiu; Lanfen Lin; Xinyin Huang; Yenwei Chen

Journal ArticleOPEN ACCESS

Multi-modal adaptive fusion transformer network for the estimation of depression level

Sensors (2021) 21(14)

DOI: 10.3390/s21144764

54Citations

49Readers

Abstract

Depression is a severe psychological condition that affects millions of people worldwide. As depression has received more attention in recent years, it has become imperative to develop automatic methods for detecting depression. Although numerous machine learning methods have been proposed for estimating the levels of depression via audio, visual, and audiovisual emotion sensing, several challenges still exist. For example, it is difficult to extract long-term temporal context information from long sequences of audio and visual data, and it is also difficult to select and fuse useful multi-modal information or features effectively. In addition, how to include other information or tasks to enhance the estimation accuracy is also one of the challenges. In this study, we propose a multi-modal adaptive fusion transformer network for estimating the levels of depression. Transformer-based models have achieved state-of-the-art performance in language understanding and sequence modeling. Thus, the proposed transformer-based network is utilized to extract long-term temporal context information from uni-modal audio and visual data in our work. This is the first transformer-based approach for depression detection. We also propose an adaptive fusion method for adaptively fusing useful multi-modal features. Furthermore, inspired by current multitask learning work, we also incorporate an auxiliary task (depression classification) to enhance the main task of depression level regression (estimation). The effectiveness of the proposed method has been validated on a public dataset (AVEC 2019 Detecting Depression with AI Sub-challenge) in terms of the PHQ-8 scores. Experimental results indicate that the proposed method achieves better performance compared with currently state-of-the-art methods. Our proposed method achieves a concordance correlation coefficient (CCC) of 0.733 on AVEC 2019 which is 6.2% higher than the accuracy (CCC = 0.696) of the state-of-the-art method.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Sun, H., Liu, J., Chai, S., Qiu, Z., Lin, L., Huang, X., & Chen, Y. (2021). Multi-modal adaptive fusion transformer network for the estimation of depression level. Sensors, 21(14). https://doi.org/10.3390/s21144764

Readers' Seniority

PhD / Post grad / Masters / Doc 9

47%

Researcher 7

37%

Lecturer / Post doc 3

16%

Readers' Discipline

Engineering 7

44%

Computer Science 6

38%

Psychology 2

13%

Economics, Econometrics and Finance 1

Multi-modal adaptive fusion transformer network for the estimation of depression level

Abstract

Author supplied keywords

References Powered by Scopus

A concordance correlation coefficient to evaluate reproducibility

The PHQ-9: A new depression diagnostic and severity measure

The PHQ-8 as a measure of current depression in the general population

Cited by Powered by Scopus

CubeMLP: An MLP-based Model for Multimodal Sentiment Analysis and Depression Estimation

A multimodal fusion model with multi-level attention mechanism for depression detection

Modern views of machine learning for precision psychiatry

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline