Abstract
Due to the COronaVIrus Disease 2019 (COVID-19) pandemic, early screening of COVID-19 is essential to prevent its transmission. Detecting COVID-19 with computer audition techniques has in recent studies shown the potential to achieve a fast, cheap, and ecologically friendly diagnosis. Respiratory sounds and speech may contain rich and complementary information about COVID-19 clinical conditions. Therefore, we propose training three deep neural networks on three types of sounds (breathing/counting/vowel) and assembling these models to improve the performance. More specifically, we employ Convolutional Neural Networks (CNNs) to extract spatial representations from log Mel spectrograms and a multi-head attention mechanism in the transformer to mine temporal context information from the CNNs' outputs. The experimental results demonstrate that the transformer-based CNNs can effectively detect COVID-19 on the DiCOVA Track-2 database (AUC: 70.0%) and outperform simple CNNs and hybrid CNN-RNNs.
Cite
CITATION STYLE
Chang, Y., Ren, Z., & Schuller, B. W. (2021). Transformer-based CNNs: Mining Temporal Context Information for Multi-sound COVID-19 Diagnosis. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS (Vol. 2021-January, pp. 2335–2338). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/EMBC46164.2021.9629552
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.