Simultaneous character-cluster-based word segmentation and named entity recognition in thai language

5Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Named entity recognition in inherent-vowel alphabetic languages such as Burmese, Khmer, Lao, Tamil, Telugu, Bali, and Thai, is difficult since there are no explicit boundaries among words or sentences. This paper presents a novel method to exploit the concept of character clusters, a sequence of inseparable characters, to group characters into clusters, utilize statistics among characters and their clusters to extract Thai words and then recognize named entities, simultaneously. Integrated of two phases, the word-segmentation model and the named-entity-recognition model, context features are exploited to learn parameters for these two discriminative probabilistic models, i.e., CRFs, to rank a set of word and named entity candidates generated. The experimental result shows that our method significantly increases the performance of segmenting word and recognizing entities with the F-measure of 96.14% and 83.68%, respectively. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Tongtep, N., & Theeramunkong, T. (2011). Simultaneous character-cluster-based word segmentation and named entity recognition in thai language. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6746 LNAI, pp. 216–225). https://doi.org/10.1007/978-3-642-24788-0_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free