Japanese unknown word identification by character-based chunking

17Citations
Citations of this article
72Readers
Mendeley users who have this article in their library.

Abstract

We introduce a character-based chunking for unknown word identification in Japanese text. A major advantage of our method is an ability to detect low frequency unknown words of unrestricted character type patterns. The method is built upon SVM-based chunking, by use of character n-gram and surrounding context of n-best word segmentation candidates from statistical morphological analysis as features. It is applied to newspapers and patent texts, achieving 95% precision and 55-70% recall for newspapers and more than 85% precision for patent texts.

Cite

CITATION STYLE

APA

Asahara, M., & Matsumoto, Y. (2004). Japanese unknown word identification by character-based chunking. In COLING 2004 - Proceedings of the 20th International Conference on Computational Linguistics. Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220355.1220421

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free