Recent research on multi-criteria Chinese word segmentation (MCCWS) mainly focuses on building complex private structures, adding more handcrafted features, or introducing complex optimization processes. In this work, we show that through a simple yet elegant input-hint-based MCCWS model, we can achieve state-of-the-art (SoTA) performances on several datasets simultaneously. We further propose a novel criterion-denoising objective that hurts slightly on F1 score but achieves SoTA recall on out-of-vocabulary words. Our result establishes a simple yet strong baseline for future MCCWS research. Source code is available at https://github.com/IKMLab/MCCWS.
CITATION STYLE
Chou, T. H., Lin, C. Y., & Kao, H. Y. (2023). Advancing Multi-Criteria Chinese Word Segmentation Through Criterion Classification and Denoising. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 6460–6476). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.356
Mendeley helps you to discover research relevant for your work.