Semi-supervised training of transformer and causal dilated convolution network with applications to topic classification

4Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Aiming at the audio event recognition problem of speech recognition, a decision fusion method based on the Transformer and Causal Dilated Convolutional Network (TCDCN) framework is proposed. This method can adjust the model sound events for a long time and capture the time correlation, and can effectively deal with the sparsity of audio data. At the same time, our dataset comes from audio clips cropped by YouTube. In order to reliably and stably identify audio topics, we extract different features and different loss function calculation methods to find the best model solution. The experimental results from different test models show that the TCDCN model proposed in this paper achieves better recognition results than the classification using neural networks and other fusion methods.

Cite

CITATION STYLE

APA

Zeng, J., Zhang, D., Li, Z., & Li, X. (2021). Semi-supervised training of transformer and causal dilated convolution network with applications to topic classification. Applied Sciences (Switzerland), 11(12). https://doi.org/10.3390/app11125712

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free