Automatic speaking assessment methods are essential for helping non-native learners to learn native pronunciation. The automated speaking assessment method consists of mispronunciation detection and pronunciation quality assessment. In the past, researchers have usually focused their research on only one specific aspect of the speaking assessment task. Research on multifaceted speaking tasks has been rare, and model building has often led to reduced performance due to the omission of local feature details. In this paper, we propose a multi-width band (MB) method and apply it to the Conformer model. This method can effectively increase the ability of the model to obtain local feature information at different scales. At the same time, we used a multi-task learning approach to train a multifaceted speaking assessment model based on GOP features. We conducted experiments on a self-built monosyllabic Mandarin mispronunciation detection dataset (PSC-MonoSyllable) and an English open-source pronunciation quality assessment dataset (SpeechOcean762), respectively. The experimental results show that the method's mispronunciation detection metrics in terms of phonemes, tones, and words on the PSC-MonoSyllable dataset (F1 scores) reached 70.18%, 80.06%, and 79.82%, respectively. The results of the method on the SpeechOcean 762 dataset for the pronunciation quality assessment task also showed a certain degree of improvement in all aspects of the phoneme-and grapheme-level correlation metrics compared with the baseline model.
CITATION STYLE
Fan, Z., Li, J., Wumaier, A., Kadeer, Z., & Abdurahman, A. (2023). A Multifaceted Approach to Oral Assessment Based on the Conformer Architecture. IEEE Access, 11, 28318–28329. https://doi.org/10.1109/ACCESS.2023.3255986
Mendeley helps you to discover research relevant for your work.