Japanese unknown word identification by character-based chunking

Masayuki Asahara; Yuji Matsumoto

Conference ProceedingsOPEN ACCESS

Japanese unknown word identification by character-based chunking

COLING 2004 - Proceedings of the 20th International Conference on Computational Linguistics (2004)

DOI: 10.3115/1220355.1220421

17Citations

72Readers

Abstract

We introduce a character-based chunking for unknown word identification in Japanese text. A major advantage of our method is an ability to detect low frequency unknown words of unrestricted character type patterns. The method is built upon SVM-based chunking, by use of character n-gram and surrounding context of n-best word segmentation candidates from statistical morphological analysis as features. It is applied to newspapers and patent texts, achieving 95% precision and 55-70% recall for newspapers and more than 85% precision for patent texts.

Cite

CITATION STYLE

APA

Asahara, M., & Matsumoto, Y. (2004). Japanese unknown word identification by character-based chunking. In COLING 2004 - Proceedings of the 20th International Conference on Computational Linguistics. Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220355.1220421

Japanese unknown word identification by character-based chunking

Abstract

Cite

Register to see more suggestions