Transformer-based CNNs: Mining Temporal Context Information for Multi-sound COVID-19 Diagnosis

9Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Due to the COronaVIrus Disease 2019 (COVID-19) pandemic, early screening of COVID-19 is essential to prevent its transmission. Detecting COVID-19 with computer audition techniques has in recent studies shown the potential to achieve a fast, cheap, and ecologically friendly diagnosis. Respiratory sounds and speech may contain rich and complementary information about COVID-19 clinical conditions. Therefore, we propose training three deep neural networks on three types of sounds (breathing/counting/vowel) and assembling these models to improve the performance. More specifically, we employ Convolutional Neural Networks (CNNs) to extract spatial representations from log Mel spectrograms and a multi-head attention mechanism in the transformer to mine temporal context information from the CNNs' outputs. The experimental results demonstrate that the transformer-based CNNs can effectively detect COVID-19 on the DiCOVA Track-2 database (AUC: 70.0%) and outperform simple CNNs and hybrid CNN-RNNs.

Cite

CITATION STYLE

APA

Chang, Y., Ren, Z., & Schuller, B. W. (2021). Transformer-based CNNs: Mining Temporal Context Information for Multi-sound COVID-19 Diagnosis. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS (Vol. 2021-January, pp. 2335–2338). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/EMBC46164.2021.9629552

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free