Advancing Multi-Criteria Chinese Word Segmentation Through Criterion Classification and Denoising

1Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

Recent research on multi-criteria Chinese word segmentation (MCCWS) mainly focuses on building complex private structures, adding more handcrafted features, or introducing complex optimization processes. In this work, we show that through a simple yet elegant input-hint-based MCCWS model, we can achieve state-of-the-art (SoTA) performances on several datasets simultaneously. We further propose a novel criterion-denoising objective that hurts slightly on F1 score but achieves SoTA recall on out-of-vocabulary words. Our result establishes a simple yet strong baseline for future MCCWS research. Source code is available at https://github.com/IKMLab/MCCWS.

Cite

CITATION STYLE

APA

Chou, T. H., Lin, C. Y., & Kao, H. Y. (2023). Advancing Multi-Criteria Chinese Word Segmentation Through Criterion Classification and Denoising. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 6460–6476). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.356

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free