A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization

Shohei Higashiyama; Masao Utiyama; Taro Watanabe; Eiichiro Sumita

Conference Proceedings

A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization

W-NUT 2021 - 7th Workshop on Noisy User-Generated Text, Proceedings of the Conference (2021) 67-80

DOI: 10.18653/v1/2021.wnut-1.9

4Citations

51Readers

Get full text

Abstract

Lexical normalization, in addition to word segmentation and part-of-speech tagging, is a fundamental task for Japanese user-generated text processing. In this paper, we propose a text editing model to solve the three task jointly and methods of pseudo-labeled data generation to overcome the problem of data deficiency. Our experiments showed that the proposed model achieved better normalization performance when trained on more diverse pseudo-labeled data.

Cite

CITATION STYLE

APA

Higashiyama, S., Utiyama, M., Watanabe, T., & Sumita, E. (2021). A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization. In W-NUT 2021 - 7th Workshop on Noisy User-Generated Text, Proceedings of the Conference (pp. 67–80). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.wnut-1.9

A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization

Abstract

Cite

Register to see more suggestions