Abstract
Lexical normalization, in addition to word segmentation and part-of-speech tagging, is a fundamental task for Japanese user-generated text processing. In this paper, we propose a text editing model to solve the three task jointly and methods of pseudo-labeled data generation to overcome the problem of data deficiency. Our experiments showed that the proposed model achieved better normalization performance when trained on more diverse pseudo-labeled data.
Cite
CITATION STYLE
Higashiyama, S., Utiyama, M., Watanabe, T., & Sumita, E. (2021). A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization. In W-NUT 2021 - 7th Workshop on Noisy User-Generated Text, Proceedings of the Conference (pp. 67–80). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.wnut-1.9
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.