Grammatical error correction (GEC) is a promising task aimed at correcting errors in a text. Many methods have been proposed to facilitate this task with remarkable results. However, most of them only focus on enhancing textual feature extraction without exploring the usage of other modalities' information (e.g., speech), which can also provide valuable knowledge to help the model detect grammatical errors. To shore up this deficiency, we propose a novel framework that integrates both speech and text features to enhance GEC. In detail, we create new multimodal GEC datasets for English and German by generating audio from text using the advanced text-to-speech models. Subsequently, we extract acoustic and textual representations by a multimodal encoder that consists of a speech and a text encoder. A mixture-of-experts (MoE) layer is employed to selectively align representations from the two modalities, and then a dot attention mechanism is used to fuse them as final multimodal representations. Experimental results on CoNLL14, BEA19 English, and Falko-MERLIN German show that our multimodal GEC models achieve significant improvements over strong baselines and achieve a new state-of-the-art result on the Falko-MERLIN test set.
CITATION STYLE
Fang, T., Hu, J., Wong, D. F., Wan, X., Chao, L. S., & Chang, T. H. (2023). Improving Grammatical Error Correction with Multimodal Feature Integration. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 9328–9344). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.594
Mendeley helps you to discover research relevant for your work.